• No results found

Publishing pressure and false results in fMRI literature

N/A
N/A
Protected

Academic year: 2021

Share "Publishing pressure and false results in fMRI literature"

Copied!
22
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Neuroscience

Literature Thesis

12 EC

August 2017

Publishing pressure and false results in

fMRI literature

Author: Antonia Kaiser 11118040 Supervisor: Birte Forstmann Co-assessor: Max Keuken August 28, 2017

(2)

Antonia Kaiser

Introduction

Recent technical advances have furthered the field of neuroimaging enormously. New measuring techniques like functional magnetic resonance imaging (fMRI), magnetoencephalog-raphy (MEG), electroencephalogmagnetoencephalog-raphy (EEG) and many more have been developed and have introduced a new level of understanding the human mind. In addition to philosophical thought experiments and behavioral observations of living beings, these techniques can be used to study the mind in more detail. Not only hardware has been improved over the last years, but also data processing and modeling have been developed and advanced. With these progresses, bigger datasets are being collected that can be handled by many different methods, resulting in a plurality in the field (Carp, 2012). As described in a study by Ioannidis (2005), analytic flexibility is a key risk factor for increased rates of false positive results when combined with selective reporting of favorable analysis methods.

The apprehension that most of current research literature contains false results is increas-ing over the last years (Colquhoun, 2014). If results can be trusted depends on several conditions. Statistics must be calculated well, which includes taking power and bias into account. The study design has to be thought through thoroughly, interpretations have to be evaluated objectively, replications have to be considered and publication and financial pressure need to be overcome (Ioannidis, 2005). Munafo et al. (2017) summarized threats to reproducible science as: low sample size, small effect size, data dredging (also called "p-hacking"), conflicts of interest and large competition between researchers (Figure 1).

Figure 1 . Threats to reproducible science (Munafò et al., 2017)

These problems exist in many research fields. In this review, the focus will be on fMRI research. There are currently 459.354 papers to be found under the search cue "fMRI" on

(3)

PubMed. Yearly there are around 30.000 papers published on this topic (Figure 2). In recent years, fMRI as a measuring tool has become more and more popular. Statistical methods have only rarely been validated by real data, a lot of validations have been performed using simulated data. Simulations that are supposed to simulate the complexity of noise arising from the human body in a MRI scanner are very hard to achieve (Welvaert and Rosseel, 2014). Eklund et al. (2005) evaluated statistical methods for fMRI with real data and found instead of 5% false positives, a 70% false-positive rate for the most commonly used software packages (SPM, FSL, AFNI). Bennett et al. (2008) evaluated results of fMRI articles published in six major journals and found 25% to 30% of them used an uncorrected threshold. They also speculated, that the numbers for conference posters and presentations is even higher, around 80%.

Figure 2 . Publications with the keyword "fMRI" over the last years on pubMed.

These numbers are disturbingly high and call for a radical change in the field of fMRI research. I would like to motivate a change in the field with a thorough summary of the problem and suggestions for solutions from the literature and myself. For that I decided to separate the problem into sub-problems. I chose to divide it into five different categories: Motivation of the researcher and sponsors, study design, statistics and analysis, interpreta-tion, and replication.

(4)

The Problem Motivation

The motivation to conduct a certain study can have major influence on the outcome. If a researcher tries to answer a certain question that mostly means that a personal interest in the topic is present. Hypothesis are being formulated and tested. If these turn out to be wrong and the researcher is not able to prove what was thought to be the case, frustration can be involved. These frustrations can result in unintentional or even intentional manipulations, pushing the data toward the stated hypothesis. Many independent research groups are working on the same questions and problems, leading to enormous time pressure to be either the first one to finish studies or have the best-looking results to beat the other competing groups in getting published and therefore receiving more funding (Tijdink et al., 2014). Time constrains do not allow research groups to plan their research carefully and put enough time and thought in the correct analysis and interpretation. Additionally, researchers are more drawn towards rather believing their significant p-values than double-checking their calculations. With most journals only publishing the studies with incredible, new and good sounding results, studies that are well planned and thought through do not have a chance in competing. Negative results are only interesting enough to get published if positive results to the same research question have been published before (Ioannidis, 2005). PhD students and researchers at the early stage of their career need to finish their projects (which includes more than three publications in four years in most countries) in time, or face working without payment. This puts them under enormous time pressure. These conditions combined with little experience can easily lead to unintentional false results. In fMRI research the problem gets even bigger, considering the costs per participant. An MRI scan of one hour typically costs between 850 - 4200 Euros, depending on the complexity of the scan.

Funding in research is a big problem itself. Governmental research funding is being cut more and more over the last years. Researchers need to find funding sources somewhere else, which results in long application processes for funds other than from the government. Especially for early researchers this raises a problem, because they cannot show a big influence on the field yet. Starting grants are extremely competitive and set tight time planning and funding for the research after (Sewitz, 2014). Grant proposals are complicated and need to be prepared thoroughly to be successful, this takes away time from the actual research process. Crowd-funding and private grants get more and more popular to apply to, because researchers have simply given up on applying for the usual support sources. Especially countries that are not well known for producing cutting edge research are not being supported enough (Gua, 2014).

But not only high costs play an important role in increasing publishing pressure. Measures of scientific output determine the range and importance of researchers and their groups. This leads to a high pressure to produce a maximum number of scientific output. Tijdink et al. (2013) showed a significant correlation of publishing pressure and burn out syndrome in medical science. Due to the issue, important aspects like quality, validity, accuracy and integrity of the research are (or tend to be) neglected (Adler and Harzing, 2009).

Additionally, personal commitment and scientific interest in a research topic can create a motivation-bias. As described before, this can lead to distorted results and false reports.

(5)

Furthermore, it can have an influence on the peer-review process of journals. Papers are being reviewed by other researchers before a report gets published, to evaluate the quality of the research and decide if a report is going to be accepted. Mostly, reviewers work on a similar topic as the research group being reviewed. If the topic of the paper under review overlaps or even conflicts a result of a reviewer it could be suppressed or even ignored to undermine competitors (Antman et al., 1992).

Design

The choice of task and design has a big influence on the reliability of fMRI research. Bennett et al. (2013) found different reliabilities for different combinations of cognitive tasks and designs. A reliability analysis could show the internal consistency of the data conducted and used. Nevertheless, in most result sections in neuroimaging literature, a reliability analysis is not included. This makes results hard to interpret and generalize. Additionally, it is not clear if the analysis was not performed or left out because the result of it was not appreciated.

Because of the high price of MRI scans and tight schedules, flexible study designs and small studies with low statistical power are common in neuroimaging. The financial pressure and therefore also time pressure makes it impossible for most researchers to take these very important factors into account. A study in 2007 showed, that a typical dataset generated one false positive in 97% of the times (Sullivan, 2007). In fMRI research the functionality of the brain is tested with cognitive tests. Many groups have developed different tests for the same cognitive function. Every test is slightly different and therefore gives different results. This makes it hard to compare studies, claiming to investigate the same cognitive functions without stating the test used. If the test is commonly used and validated, this should clearly be stated.

Effect size quantifies the size of a difference between two groups. It helps to be able to interpret correlations between variables better and enhances the understanding of the study sample. Effect size measures are standardized using the standard deviation as denominator. This puts variables on a scale that is comparable across studies and determines the practical and theoretical importance of an effect (Fritz et al., 2012). Many studies in neuroimaging do not calculate or do not state effect size (Chen et al., 2017). In fMRI studies it is mostly even impossible to calculate an effect size because of noisy measurements. The scale of MRI signal is not commonly known and seems to be arbitrary. Percent signal change is one way of quantifying it, but this also differs from scanner to scanner. Comparing different outcomes with each other becomes extremely complicated because of this fact.

Another problem in planning the design of a study is that researchers change their initial ideas and hypothesis after having seen the results. This would not be a problem if it was stated openly and called exploratory work. HARKing is defined as presenting a post-hoc hypothesis (i.e., one based on or informed by one’s results) in one’s research report as if it were, in fact, an a priori hypotheses. This is practiced quite regularly and causes wrong interpretations of results (Kerr, 1998). Not only internal unintentional mistakes are being made in planning a study. Certain funding sources put pressure on researcher to plan their studies in a certain way and only support studies further that provide evidence for certain products or findings. HARKing and p-hacking (discussed in the following paragraph) are regularly found in these situations.

(6)

Statistics

Small studies with low power are prevalent in neuroscience, because of time pressure and tight financial plans (see paragraph "Motivation"). Mostly, power analyses are not clearly stated and wrongly interpreted. The reliability of these studies is very low (Button et al., 2013).

Inference is often based on uncorrected statistical results (Bennett et al., 2009b). P-thresholds need to be adapted to the data, because they are otherwise certain to be incorrect in almost every situation. Poldrack et al. (2012) took fMRI data and created a random individual difference variable. Assuming that this variable was random and had nothing to do with the hypothesis it should not have been correlated. Using an uncorrected threshold of p<0.001 and a minimum cluster size of 25 voxels showed a significant region close to the amygdala. When using corrected statistics, this correlation disappeared. Even though this problem is commonly known, these mistakes are easily made and can still be found in neuroimaging literature.

A common approach to analyze neuroimaging data is to use univariate testing for multiple voxels. A separate hypothesis for each voxel is performed, whereby the false-positive rate increases if no correction for multiple testing is performed. A very famous example of this is the so called "Salmon-experiment", where a dead fish was scanned and activation was shown in the brain when no correction for multiple testing was performed (Bennett et al., 2009a). The problem of multiple testing is by now well known in the field and many correction strategies have been developed. Some of these methods are very conservative and strict, so that interesting effects might be missed, others are too liberal so that false positives still slip through. In general, it can be said that there are many different ways to deal with this problem, including different software/-packages. Well established and controlled software like FSL (Analysis Group, FMRIB, Oxford, UK) or SPM (Statistical Parametric Mapping, Wellcome Trust Centre for Neuroimaging, London, UK) provide Gaussian and non-parametric analysis pipelines to correct for multiple comparison. Nevertheless, it is still common practice to use different packages, and go through the effort of downloading and installing them. Non-parametric methods are known to be good practice to control for familywise error rates and are suited for nearly all models available (Nichols and Hayasaka, 2003). Different correction and analysis tools are tried to see which results support the initial hypothesis best, so called p-hacking.

P-hacking is the use of data and certain patterns to uncover significant correlations without having a specific hypothesis to the underlying causality. One example of p-hacking (delib-erately or not) is using a small volume correction with a mask chosen after seeing the initial results. Any analysis performed this way is useless and should be avoided. Another error performed is ignoring outliers. Particularly with small sample sizes, outliers have a big influence on correlations between behavior and activation. Quality control of the statistical results is not used often enough and generalizability is assumed too quickly.

The complexity and availability of analysis software increased rapidly over the last years. Many researchers implement their own approach of statistical evaluation rather than using well-established software tools. Because the field of fMRI is very complex, many analyses specific to projects have to be build in-house. That brings along the problem, that not every software and pipeline is validated and tested in detail. Errors in software with a small

(7)

user base are less likely to be discovered (Poldrack et al., 2017). This problem began to get attention in 2015, when Eklund et al. published a paper about a 15-year old bug found in the cluster analysis of the AFNI program, which questioned the results of over 1000 articles published by that time. Even though this software package is used by many researchers it took a long time to discover the bug. Most research labs use pre-processing analysis scripts only once for a specific project, quality checks are rarely done and errors would not be noticed. An additional problem with the availability of a lot of different software and hardware, is that results of data analyzed with different methods differ. Carp (2012) analyzed one data set with 6912 unique analysis pipelines with five different thresholding approaches to correct for multiple comparison, resulting in 34560 significant maps. Some of the results were relatively consistent, others showed a considerable variability over different methods. He suggests, that false positive results (not only caused by this issue) might be more present in literature than expected.

The biggest problem with false results in the literature and in general research in neuroimag-ing is the loss of trust. Not only researchers readneuroimag-ing the results get wary of believneuroimag-ing what others have shown, it also makes the trouble of not having enough money to make research more reliable worse. Sponsors watching the developments in research fields become aware that it is not worth investing in a certain field, because scandals proof years of research wrong. This produces a circle that is hard to escape unless researcher work together and start using reliable and stable methods to produce believable science.

Freedom of Interpretation

The interpretation of neuroimaging data is complex and difficult. Understanding what can be concluded from changes in BOLD signal is not straight forward. Studies have shown for instance that an increase in neuronal activity evokes a BOLD response, but that it does not mean that an invoked BOLD response also implies increased neuronal activity. In fact, BOLD response measures hemodynamic changes rather than neuronal activity (Heeger and Ress, 2002).

Understanding fMRI studies is a challenge. Because it is a rapidly changing and rather new field, new methods, ideas and ways of pre-processing and analyzing are being devel-oped frequently. Being up-to-date on the proceedings is rather hard and time consuming. That makes it complicated to be able to understand literature about this topic. Not only researchers trying to stay informed are having this problem, also reviewers for journals need to be able to fully grasp what research groups intended with their research projects. The fact that researchers of the same field have problems is telling that it may be hard for people outside the field (i.e. sponsors, psychologists, doctors etc.) to read literature about fMRI research.

On top of this problem, many details of planning, conduction, pre-processing and analysis are being left out in reports. It is not always clear if these details are missing because they have not been performed, or because they have not been reported. These two options make a big difference for the interpretation of the results of a study and make it impossible to generalize or compare results (Simundić, 2013).

Furthermore, research conducted by humans is never entirely objective. Discussions and conclusions are always biased by the opinion of the specific researcher writing and deriving it. Keeping all these factors in mind when reading a report is a challenge for everyone.

(8)

Replication/Reproducibility

The first issue that need to be discussed concerning replication and reproducibility are the definitions of the words reproducible, replicable, robust and generalizable. If a research group tries to answer the same question using the same data and the same code they are trying to reproduce a result. Answering the same question with different data, but the same code replicates results. If the same question is being answered with the same data but different code, the result is robust and if the question is answered with different data and different code, it is generalizable. All four of these ways of proofing outcomes of studies is very important and not done often enough (Whitaker, 2017; Figure 3).

Figure 3 . Whitaker, 2017

The replication problem has become a heavily discussed and concerning topic in espe-cially psychology the last few years. In several fields this has resulted in big projects trying to reproduce findings, in which many studies have been shown to not be reproducible (Pol-drack and Poline, 2015).

Even though it has come to the attention of the field, replications of studies are still not common. The reward of publishing a replication study is not as high as publishing new findings in high-impact journals. It is also hard to find sponsors to fund replications instead of fundamental or investigational work.

Additionally, it would help to publish and conduct more meta-analysis in neuroimaging (Müller et al., 2016). These could reveal false results and problems in analysis pipelines. Investigators in meta-analysis are also hard to find, even though data acquisition and there-fore the costs for scanning and participate recruitment do not apply.

Publication Bias

The publication bias is a major problem for research groups. Journals and reviewers pre-fer studies for publication, that are exciting and show significant results. Some sponsors only pay part of the grant if the data gets published. The bias causes erroneous interpretation

(9)

of results and the progress of the field. The lack of negative results being published yields to repetition of mistakes and research that corresponds in high costs (Jansen of Lorkeers et al., 2014).

Solutions Pre-registration

In the last years one solution that has been developed to fight the aforementioned problems is pre-registration of research. Pre-registration is meant to help the researcher distinguish between the stage of generating hypothesis and the stage of testing the hypoth-esis. Researchers register their outline of intended analysis to answer a specific question. This should be done in such detail, that the analysis could in theory be done with fake data before the real data is collected (ope, 2017). Pre-registering a project does not take away the possibility to do exploratory analysis after the data has been collected. Exploratory analysis need to be explicitly named and mentioned in a different section of the report, but is possible to do and useful.

The initial idea, hypothesis and analysis plan are being reviewed by a committee and peers. The feedback and critique needs to be incorporated in the application until no further changes are being requested (Figure 4). This step is helping early researchers as well as more experienced scientists to perform well-planned and correct analysis and avoid mistakes to be made several times.

Most pre-registrations come with a guarantee to publish the results. That means, that the project is pre-registered with a specific journal, that guarantees a publication if the proce-dure is done well and thoroughly, even if results are not significant. This takes away the pressure of producing wanted results and low p-values and motivates researchers to perform good and honest research.

Negative results being published is beneficial for everyone. It avoids that several groups make the same mistakes in their research and gives a better picture of what is being done. Negative results are as informative as positive results and avoids studies being performed several times (Colquhoun, 2014). This also saves money of sponsors and research groups. Once a pre-registration has successfully been done, the write-up of the study is faster. Most of the writing and preparation has already been done before conducting the data and was already adapted to the rules and guidelines of a specific journal. This makes it easier to plan ahead and keep deadlines. Additionally, there is a second peer review step after writing the report. This gives more help to the researcher and guarantees that the analysis and conduction was done the way it was pre-registered (Figure 4).

(10)

The procedure of pre-registering studies is still time-consuming and confusing. Several teams are currently trying to make it easier and accessible for more groups. Unfortunately, not every research question is suitable for preregistration. Hypothesis-free (for example exploratory work) and methods development work is not appropriate for preregistration. More journals should offer this option to make it available for a broader spectrum of research. Chambers et al. (2017) extended the pre-registration format for the European Journal of Neuroscience into new domains (system neuroscience, molecular neuroscience, clinical neuroscience) and allowed for applications using secondary analyses of existing data sets. Authors need to be able to proof that they did not have any a-priori access to the data before registration. More than 40 journals offer registered reports now. For more information about pre-registration the homepage of the center for open science can be visited (COS, 2017).

Peer Review

If preregistration is not an option, a similar approach can be uploading study plans and unpublished results on peer-review web pages and blogs. Here, anyone that has signed up on the same webpage can comment on what has been uploaded. This is an easier and less strict way of getting help and feedback on planned projects and analysis. Additionally, it is beneficial to upload your report on these homepages to make them openly accessible for everyone, even if the journal you are publishing with is not. This makes research more visible, which benefits the publishing researcher (researchers will find the paper faster and may cite it) and other researchers (the study is openly accessible and easier to find). Additionally, homepages that provide a medium to comment on any published report are being used. This provides both the publishing researcher and the readers with an easy acces-sible way of communication, that is also viacces-sible to other users. It gives room for discussion, critic and compliments which enhances the understanding and quality of published data. Examples of homepages like that are: www.blog.scienceopen.com, www.f1000research.com, www.blogs.biomedcentral.com, www.blogs.openaire.eu and many more.

Data Sharing

A large amount of neuroimaging studies have been acquired over the last years. Vast amounts of money and time have been invested in all kinds of different task-measurement combinations. With the problem of limited funding and time pressure in the field, data sharing could be a way to solve these problems for researchers.

Even though data sharing has been accepted more and more over the last years, researchers are still reluctant to share data. Studies are still being conducted that have been conducted the same or a similar way before and could be reused. Even though data sharing seems to be an obvious improvement for the field, there are still some problems and obstacles to be solved (Nichols et al., 2017).

One problem is the researcher’s motivation to share data. Additional to the acquisition costs (time, money and expertise), the process of sharing data and maintaining it is costly and not planned in in the schedule of normal research and funding. The fear of others finding the same results faster or finding even better results with the hard-earned data is present. If data is shared along with a publication the fear of others finding errors in the acquisition comes on top (Poline et al., 2012).

(11)

Another problem are ethical and legal issues. Participants giving written consent to a study in conjunction with a specific research question and context. They trust in the specific research group to keep their privacy. The specific properties of individual brain images make it possible to being able to recognize individual brains from different studies and link them to each other. However, different techniques are being implemented and improved to avoid this problem. For example, some data bases only show aggregated rather than individual data. Informed consent forms normally do not involve a paragraph about sharing the data. Many Ethical Research Committees would not approve a written consent involving a paragraph like that, or do not allow sharing the data without a signed consent form involving such a phrase. Convincing the ethical committee of data sharing and collecting new consent forms after data has been collected already, involves effort and time that most research groups do not have. The solution would be that institutions incorporate and support data sharing and involve it in the legal process from the beginning on (Poldrack and Gorgolewski, 2014). The biggest problem of data sharing is still the technical implementation. There are a few homepages available to help researchers deal with data sharing, but there is not yet a common way of organizing the data and simply uploading it. FMRI data is big and needs to be prepared so that it can be used by others. Gorgolewski et al. (2016) developed a data structure for brain imaging data (BIDS) to describe outputs in an organized and structured way. BIDS uses the NIfTI file format and includes a JSON file for additional meta information. It includes a certain file structure for every participant that includes the scanning method and procedure for raw as well as preprocessed and analyzed data. Scripts for testing the data structure for errors and automated pipelines for certain processing steps are available (Figure 5).

Figure 5 . Gorgolewski et al., 2016

Using the same data structure, data sharing becomes way easier and useful. A problem that remains are server and homepages with enough space to upload the huge amounts of data available. If the server is maintained by the research group itself, it raises problems of accessibility and anonymization. These steps require expertise that is not always at hand. In many research groups a standardized way of saving and naming data is still missing. Even with the suggestion of BIDS it takes time to rename and restructure old data to make it more accessible. Data should be named after the BIDS principle right away, to avoid

(12)

wasting time on reorganization (Gorgolewski et al., 2016).

Some researchers avoid to publish their data before being published in a journal, be-cause other research groups could take competitive advantage of it (so called "scooping"). To avoid this, an embargo period can we set on the published data, until results are published. OSF (http://help.osf.io/m/faqs/), figshare (https://figshare.com) and Dryad (http://datadryad.org/pages/faq) offer this support.

If the data collected for a study is very complex or the dataset is very large it can also be considered to write a data paper. This is a new type of publication, which offers to write a paper purely about the description and explanation of a data set. It not only gives the room to describe and elaborate on the collected data in greater detail than in a normal publica-tion, it also gives the opportunity to cite the responsible researcher if the data is reused by other groups. Scientific Data, Gigascience, Data in Brief, F1000Research, Neuroinformatics and Frontiers in Neuroscience (...) currently accept neuroimaging data papers (Gorgolewski and Poldrack, 2016).

Code Sharing

Not only neuroimaging data should be shared, but also the corresponding pipelines and codes. Vandewalle (2012) showed that data sharing is correlated with the number of citation of a paper and therefore even beneficial for the researcher directly. Currently, many differ-ent versions of software and hardware are being used to analyze differdiffer-ent data sets. These have an influence on the outcome of studies. Sharing used analysis methods would inform researchers how to interpret published data, but would also help them to analyze their data in an appropriate way. Many preprocessing and analysis steps are being implemented in-house even though they are already available. Using the same software and implementations would make results more generalizable and comparable. It would also save a lot of time and money. Suggestions for an easy way to share code are being developed and improved. Github (https://github.com/) for example is a platform to host, review and manage code and projects. It has a build in version control function and makes it easy for other users to comment on your projects and help developing and improving it. One disadvantage of it is the price. The research group has to have an account available that everyone can use. Not every group has the funding to do so. Additionally, it takes time to sustain the uploaded content. Comments of other users have to be reviewed and incorporated and questions have to answered. In defining how much support and help other researchers can expect from the provider when working with the available code it can be avoided being asked for support too much. Kubilius reviewed GitHub and the Open Science Framework as tools for code sharing and gave suggestions for less known platforms in 2014.

On top of providing code and data, it is also useful to share the study procedure. The more information is available the less questions other users have to ask. Sharing details about the planning of the project makes it possible to replicate findings and use the same procedure for different questions. Jupyter (http://jupyter.org/) is an open source online platform that allows to write notebooks, provide data and code and show analysis and visualizations (Figure 6). It provides a web-application to create and share documents that contain live code, equations and explanatory text. 40 different programming languages, sharing web applications (i.e. Dropbox, Google drive) and interactive widgets (i.e. LaTeX, JavaScript) are being supported.

(13)

Figure 6 . Example of jupyter notebook (jupiter.org)

Even if research groups do not want to share their study protocols, code and data with other researchers, these tools are also useful for internal data handling. It makes it easier for new colleagues to understand procedures, use code and analyze existing data. Students that work for a laboratory temporarily or early researchers would benefit from a well-documented study protocol. Organized and well documented study procedures and data, that is easy to understand, also makes it less likely to produce errors. Step-by-step instructions that can be followed leave little room for mistakes and human stupidity.

Statistics

Certain statistical tests and procedures can help make research more comparable and reliable. Effect size is a calculation that makes it easier to understand the size of a difference between two groups. It also sets certain measures to a scale that is easier to understand if it is not commonly used. The latter is not necessarily possible in fMRI research, as explained above. Nevertheless, it is important to calculate and mention the effect size whenever possible. It makes it easier to compare studies between each other and is necessary for meta-analysis that give a good overview about generalizability of results in the field (Chen et al., 2017).

(14)

should be performed before planning a study, to get an idea how many participants should be included to make it possible to generalize results. If it is not possible to include enough participants it is still important to mention the power analysis to present the results in proper light. New methods and toolboxes to help researchers with this are currently being developed. One great example is http://neuropowertools.org, that provides help for different data sets and a guide to find the correct analysis.

Thirdly, a reliability analysis should be included. Reliability analysis check for the reliability of the produced data in itself. This makes the results more interpretable and gives the reader an estimate of how to put the research in relation to other studies. Most commonly used statistics software and toolboxes offer great pre-build reliability analysis.

In general, it is very important to double check statistics and analysis at least once. It is especially with complicated data like fMRI data easy to make little mistakes. These could alter the results of a study dramatically. Research groups should check their peers after every processing step and make sure that the quality of work remains high. It is also important to realize that it is normal and acceptable to need help. If asking colleagues is not an option there are many blogs, mailing lists and homepages available to ask questions or search for answers of already asked questions.

Writing Style

Writing up a study needs to be planned well. It needs to meet the criteria of a journal, have certain word count, include all details and be "attractive" to read. In addition, it is also important to mention all essential details to understand the procedure and statistics to replicate it. Researchers should use a writing style that is comprehensible for not only people that are working in the same field, but also people that do not work with the same data or in the same field.

Poldrack et al. (2017) suggest a structure for future neuroimaging paper. Sample size should be determined by a power analysis and be explained in the methods part. Exclusion and inclusion criteria, as well as software workflows and definitions should be mentioned and if possibly pre-registered. Data collection, analysis and corresponding code should be openly accessibly stored and version controlled. A good description of these methods should be included in either the paper itself or it should be mentioned where to find these. Both successful and failed analysis should be automated in any way to guarantee reproducibility. Automated quality control of all steps should be performed. Exploratory results should be included, but clearly labeled. Also, not useful or successful analysis and statistics should be mentioned. All data, analysis steps, code and visualizations should be made openly accessible to enable reproduction and automated meta-analysis.

Journals

Not only researchers have to change their way of producing science, but also journals need to change their way of thinking and working.

Most journals currently only accept reports about exciting and new research. Significant results and positive findings are key properties that guarantee publication. Additionally, some journal still do not support open access. To make the aforementioned steps possible, journals should require different qualities in research than only exciting new findings. First,

(15)

all journals need to be open access to make education and knowledge of the field available for everyone independent of their wealth or status. Less money of institutes would need to go into reading rights and could be spend for high-quality research. P-values should not be an indication of the quality of results, positive as well as negative results should be considered and published. All journals should support pre-registration and give research projects that use this option the guarantee to publish their results. Data- and code-sharing should be rewarded or even required to publish results. If journals would make it easier to publish important and high-quality research, a big part of the pressure put on research groups would vanish.

There are already a few examples how to reward those that perform transparent and re-producible science. Some journals are introducing badges, that acknowledge open science, pre-registered reports and blinding procedures in analysis. Furthermore, journals need to make it easier and rewarding to commit to errors made in published work. Publications should be easy to correct and errors should not be seen as failure (Poldrack et al., 2017). A guideline of writing should be required, to encourage certain details to be included in every paper (see section about writing style). It might be argued that steps like these take away the creativity of writing, but a certain level of normalization should be practiced to be able to compare research. It also ensures that errors are being avoided and researchers do not forget to mention certain details of their research (on purpose or unintentionally). To circumvent the problem of not openly accessible journals and the publishing bias, re-searchers should always upload preprints of their submitted manuscripts. These preprints are not yet peer-reviewed papers in their final version. Benefits of submitting a preprint is the immediate presentation of the research that has been done, without having to wait for the long process of review, evaluation and editing of journals. It gives other researchers the chance to read the information, but also comment on it. Several servers are available for uploading preprints, such as Crossref, Centre of Open Science and ASAPbio. The Medical Research Council actively supports preprints and Wellcome Trust now accepts preprints in grant applications since beginning of this year. Some journals still reject papers that have been published as preprints (Gorgolewski and Poldrack, 2016). These journals should not be considered for publication to also convince them to be part of open science.

Meta-Analysis/Replication

Meta-analysis and replication studies are not being rewarded and funded enough. Recent scandals in different fields of science have shown that it is very important to make sure that results are reproducible, replicable, robust or generalizable. This work should (by researchers as well as journals and sponsors) be considered as important as any new findings. Furthermore, meta-analyses are very important for the field to get a better overview of the current status of the work being done. Working on one specific question, it is easy to lose the feeling for the status of other research. Moreover, there are so many studies being published in the field of neuroimaging, that it is faster and more efficient to read meta-analysis to get an update. Therefore, it should be supported and considered as valuable to publish meta-analysis. Especially with a change towards more data sharing this practice can further the field immensely. Additionally, it is also important to evaluate current practice of certain neuroimaging methods. Different studies are performing the same new methods on different data sets. Meta-analysis can be used to test if methods work and how reliable

(16)

different combinations of cognitive tests and neuroimaging techniques are.

The Researcher itself

The previously described solutions are all based on media research groups can use or changes that have to be done by sponsors and journals. Furthermore, a change of behavior of the researcher itself can help. Currently, it is common practice to judge research projects by their significant outcome. If every researcher changes that thinking towards judging research by its procedure and quality and appreciates negative outcomes in everyday research life, the first step is accomplished.

Research should be developed and conducted deliberately and the focus should be on quality instead of quantity. Focus on open science and data sharing should be incorporated into the usual procedures and funding for data sustainability and comprehensibility should be included from the beginning.

Additionally, researchers should be open to admit errors and mistakes. This is important to teach to the next generation of researchers and to practice it in current research. A mistake should not be a reason to blame someone, it should be admitted and corrected to avoid the same mistake in the future and guarantee a high quality in published results (Poldrack, 2017).

(17)

Critical Opinion

In this report, I have pointed out some of the problems of the current state of neu-roimaging research and summarized available solutions. In the following, I would like to express my personal critical opinion. In my opinion the attitude of research in general, not only in neuroimaging, has to change. I decided to work in science and work as a researcher, because I thought it is important to find out more about how things work in the world. Later on, I focused on the brain, because it fascinated me, that a small organ like that can control our body and mind. Since then I am eager to find out how the brain functions and what we can do to help individuals with dysfunctions. Many other researchers share this passion with me and are trying to do the exact same thing every day.

What I never understood is that everyone (or better every research group) is working by themselves, trying to be better and faster than everyone else. Why do we not work together to tackle the problems in collaboration? Don’t we all want the same and would it not be easier and way more efficient to work together?

The open science approach of making papers openly accessible, publishing data and corre-sponding code, receiving and giving feedback is developing in that direction. Additionally, I would like to suggest to build research associations. Research groups would be part of a category based on what they are working on. This would make it easier to send around drafts, give and receive feedback from peers working on the same topic and communicate and ask questions. Ideas and data could be registered in a common data base, to prevent "data theft". Server and notebooks could be shared to prevent collecting the same data even though they already exist. If a group wanted to investigate a new question with existing data, they had to cite the group that collected them, automatically giving them credit. Funding for new measurements could even be shared on a bigger scale, to collect a data set for several research groups around the world. A registration data base would be obliga-tory. Researchers would have to register their research question in a world-wide data base to check if this question (or a similar question) was investigated before. The analysis and hypothesis would then be included and compared before publication of papers, to prevent harking and p-hacking.

Similar approaches have already been tried. Some of these have proven, that working together on a bigger scale produces good results, as for example the American BRAIN Ini-tiative (www.braininiIni-tiative.org/). These projects are helpful, because they form large-scale collaborations to answer big questions of the field. In other fields, as in physics or biology this approach has long been installed and used. Examples for this are CERN in physics and the Human Genome Project in biology. Projects that produced great outcome.

But not only positive examples have been initiated. The Human Brain Project was founded by Henry Markram in 2013 and was funded by the European Union to build an artificial human brain. Unfortunately, the organization and governance of the project was not well introduced and controlled and therefore misspend a lot of the provided funding. The dif-ference between the Human Brain Project and the BRAIN Initiative for example, is how it is organized internally. The BRAIN Initiative is managed by several researchers together and has a very transparent interdisciplinary nature. The Human Brain Project however was mainly controlled by one person. It is now being rebuilt and restructured. The whole project will be more about collaboration and more scientists will be in the executive

(18)

posi-tions to share decisions. The goal will be down scaled to developing more computational tools and ways of data integration.

This supports that working together in an open and honest way brings us further faster than working against each other and only thinking about our own visions.

Additionally, I would suggest more availability of help for statistics and coding. Especially early researchers do not have a lot of experience with finding and conducting the right statistics for the investigated questions and the available data. With automated software and pipelines available, it is very easy to use statistics without fully understanding the procedure. If there was more help available and the attitude towards admitting that help is needed would be higher, many unnecessary mistakes could be avoided. The same accounts for pre- and post-processing code. It takes a long time to gather experience in certain programming languages, let alone learning them from scratch. Help with writing and eval-uating scripts and the availability of already prepared pipelines would make this process easier, more efficient and faster.

Furthermore, I would like to point out that researchers, no matter how experienced, are hu-mans. Humans are not perfect and make mistakes, everyone does. It is, in my opinion, one of the most important steps to admit mistakes and correct them. Trying to hide mistakes to keep the publication or the impressive finding is not helping anyone. It distorts findings of the fields and misleads other researchers. If these mistakes are being found by other groups later it additionally undermines the trust in research and the responsible group and individual.

To avoid details being left out in papers and reports, I agree that a certain obligatory struc-ture should be introduced for publishing. I also agree that it takes away the creativity of writing to a certain extent, but it would increase the quality of papers, which should be our first concern.

To be able to put pressure on the high impact sources, like journals we have to work to-gether. Researchers should stop publishing in journals that refuse to make papers openly accessible and take the time to tell them. If nobody publishes with them anymore and explains why they will change their policies eventually.

The measure of being a "good" scientist should be changed. Currently, it is most important to publish new exciting findings in high impact journals and be cited by as many other researchers as possible. If we could change this into a system, where researchers could gather points for different achievements the problem would solve itself. Properties that should be included in such a system should be: Data sharing, code sharing, sharing proce-dures and protocols, providing help and guidance, data sustainability, good teaching skills, pre-registration, clean and rigorous statistics, writing understandable reports, conducting replication studies and meta-analysis.

Hiring practices should be changed. Instead of only hiring researchers with many citation and publications in high-impact journals, the attention should be shifted to scientists that produce and support good science. Data sharing and open-science practices should be val-ued higher.

In general, I think it is important for researchers to make small steps in the right direction. Try to change one practice at the time and start talking about the issues to your peers.

(19)

Conclusion

The scientific world is slowly moving towards a more transparent and collaborative manner. Even though there are still a lot of problems left, solutions are being developed and improved. Now, that digital communication is easily doable more things can be done to make the scientific world more collaborative and open. There are many improvements that can be easily implemented by every scientist. With relative little effort research can be made more transparent, replicable, reproducible, robust and generalizable.

There are still challenges that cannot easily be overcome. Available funding for the field is way too low. But instead of focusing on problems that are hard to solve, I suggest we take small steps in the right direction. All the steps described in this summary are effective and suited for every scientist. If funders, publishers, societies, institutions, editors, reviewers and authors all work together and slowly change the norms we can make a change.

It is essential to work together to make the field of fMRI research believable and worth to invest in. In the end we all want the same: Advance our understanding of the human mind and brain.

(20)

References

(2014). European research funding: it’s like Robin Hood in reverseitle. (2017). Center for open science.

(2017). Open Science Framework.

Adler, N. and Harzing, A.-W. (2009). When Knowledge Wins: Transcending the Sense and Nonsense of Academic Rankings. In Academy of Management Learning and Education. Antman, E. M., Lau, J., Kupelnick, B., Mosteller, F., and Chalmers, T. C. (1992). A

Com-parison of Results of Meta-analyses of Randomized Control Trials and Recommendations of Clinical Experts. JAMA, 268(2):240–248.

Bennett, C. M., Baird, A., Miller, M. B., and Wolfrod, G. L. (2009a). Neural correlates of interspieces perspective taking in the post-mortem Atlantic Salmon: An argument for multiple comparisons correction. NeuroImage, 47(Suppl 1):S125.

Bennett, C. M. and Miller, M. B. (2013). fMRI reliability: Influences of task and experi-mental design. Cognitive, Affective, & Behavioral Neuroscience, 13(4):690–702.

Bennett, C. M., Wolford, G. L., and Miller, M. B. (2009b). The principled control of false positives in neuroimaging. Social Cognitive and Affective Neuroscience, 4:417–422. Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E.

S. J., and Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature reviews. Neuroscience, 14(5):365–76.

Carp, J. (2012). On the plurality of (methodological) worlds: estimating the analytic flexibility of FMRI experiments. Frontiers in neuroscience, 6(149):1–13.

Chambers, C. D., Forstmann, B., and Pruszynski, J. A. (2017). Registered reports at the European Journal of Neuroscience: consolidating and extending peer-reviewed study pre-registration. European Journal of Neuroscience, 45(5):627–628.

Chen, G., Taylor, P. A., and Cox, R. W. (2017). Is the statistic value all we should care about in neuroimaging? NeuroImage, 147:952–959.

Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of P values. Royal Society Open Science, 1:140216.

Eklund, A., Nichols, T. E., and Knutsson, H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences, 113(28):7900–7905.

Fritz, C. O., Morris, P. E., and Richler, J. J. (2012). Effect size estimates: Current use, calculations, and interpretation. Journal of Experimental Psychology: General, 141(1):2– 18.

(21)

Gorgolewski, K. J., Auer, T., Calhoun, V. D., Craddock, R. C., Das, S., Duff, E. P., Flandin, G., Ghosh, S. S., Glatard, T., Halchenko, Y. O., Handwerker, D. A., Hanke, M., Keator, D., Li, X., Michael, Z., Maumet, C., Nichols, B. N., Nichols, T. E., Pellman, J., Poline, J.-B., Rokem, A., Schaefer, G., Sochat, V., Triplett, W., Turner, J. A., Varoquaux, G., and Poldrack, R. A. (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific data, 3(44):160044. Gorgolewski, K. J. and Poldrack, R. A. (2016). A Practical Guide for Improving

Trans-parency and Reproducibility in Neuroimaging Research. PLoS Biology, 14(7):1–13. Heeger, D. J. and Ress, D. (2002). What does fMRI tell us about neuronal activity? Nature

Reviews Neuroscience, 3(2):142–151.

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8):0696–0701.

Jansen of Lorkeers, S. J., Doevendans, P. A., and Chamuleau, S. A. J. (2014). All preclinical trials should be registered in advance in an online registry. European journal of clinical investigation, 44(9):891–892.

Kerr, N. L. (1998). HARKing: Hypothesizing After the Results are Known. Personality and Social Psychology Review, 2(3):196–217.

Müller, V. I., Cieslik, E. C., Serbanescu, I., Laird, A. R., Fox, P. T., and Eickhoff, S. B. (2016). Altered Brain Activity in Unipolar Depression Revisited. JAMA Psychiatry, 74(1):47–55.

Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert, N., Simonsohn, U., Wagenmakers, E.-j., Ware, J. J., and Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1(January):1–9.

Nichols, T. and Hayasaka, S. (2003). Controlling the familywise error rate in functional neuroimaging: a comparative review. Statistical methods in medical research, 12(5):419– 46.

Nichols, T. E., Das, S., Eickhoff, S. B., Evans, A. C., Glatard, T., Hanke, M., Kriegeskorte, N., Milham, M. P., Poldrack, R. A., Poline, J.-b., Proal, E., Thirion, B., Essen, D. C. V., White, T., and Yeo, B. T. T. (2017). Best practices in data analysis and sharing in neuroimaging using MRI. Nature America Inc., 20(3):299–303.

Poldrack, R. A. (2012). The future of fMRI in cognitive neuroscience. NeuroImage, 62(2):1216–1220.

Poldrack, R. A. (2017). How to organize open and reproducible science. In ICON 2017. Poldrack, R. A., Baker, C. I., Durnez, J., Gorgolewski, K. J., Matthews, P. M., Munafò,

M. R., Nichols, T. E., Poline, J.-B., Vul, E., and Yarkoni, T. (2017). Scanning the horizon: towards transparent and reproducible neuroimaging research. Nature Reviews Neuroscience, 18(2):115–126.

(22)

Poldrack, R. A. and Gorgolewski, K. J. (2014). Making big data open: data sharing in neuroimaging. Nat Neurosci, 17(11):1510–1517.

Poldrack, R. A. and Poline, J. B. (2015). The publication and reproducibility challenges of shared data. Trends in Cognitive Sciences, 19(2):59–61.

Poline, J.-B., Breeze, J. L., Ghosh, S., Gorgolewski, K., Halchenko, Y. O., Hanke, M., Haselgrove, C., Helmer, K. G., Keator, D. B., Marcus, D. S., Poldrack, R. A., Schwartz, Y., Ashburner, J., and Kennedy, D. N. (2012). Data sharing in neuroimaging research. Frontiers in Neuroinformatics, 6(April):1–13.

Sewitz, S. (2014). The excellence agenda is a Trojan Horse for austerity. Simundić, A.-M. (2013). Bias in research. Biochemia medica, 23(1):12–5.

Sullivan, P. F. (2007). Spurious Genetic Associations. Biological Psychiatry, 61(10):1121– 1126.

Tijdink, J. K., Verbeke, R., and Smulders, Y. M. (2014). Publication Pressure and Scientific Misconduct in Medical Scientists. Journal of Empirical Research on, 9(5):64–71.

Tijdink, J. K., Vergouwen, A. C. M., and Smulders, Y. M. (2013). Publication Pressure and Burn Out among Dutch Medical Professors: A Nationwide Survey. PLoS ONE, 8(9):e73381.

Vandewalle, P. (2012). Code sharing is associated with research impact in image processing. Computing in Science and Engineering, 14(4):42–47.

Welvaert, M. and Rosseel, Y. (2014). A Review of fMRI Simulation Studies. PLoS ONE, 9(7):1019–53.

Referenties

GERELATEERDE DOCUMENTEN

In such cases, regulators and legislators can intervene by imposing duties to share data (including in the form of a hybrid intervention through the envisaged New

Hospitals exchanging data among themselves is not considered, since this has been already widely researched (Gordon &amp; Catalini, 2018). I will look into the different

In parallel, promoting engagement and education of the public in the relevant issues (such as different consent types or residual risk for re-identification), on both local/

Op basis van de bodemgesteldheid (pleistoceen zand op een gemiddelde diepte van 40 à 70cm) en het gebrek aan relevante sporen in het aansluitende perceel, gecombineerd met de vele

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

“Als ik even niet meer weet waar ik ben, druk ik gewoon op de thuisknop.” Voor Alzheimer Nederland is Henk zelf een soort TomTom.. Als oud-grafisch vormgever én oud- docent

When the MAGE-ML standard is finalized, data will be imported from collaborators and exported from RAD using this format.. However, each of these data representations—RAD and

50 There are four certification schemes in Europe established by the public authorities.The DPA of the German land of Schleswig- Holstein based on Article 43.2 of the Data