Large-scale automated synthesis of human functional neuroimaging data - PubMed (original) (raw)

Large-scale automated synthesis of human functional neuroimaging data

Tal Yarkoni et al. Nat Methods. 2011.

Abstract

The rapid growth of the literature on neuroimaging in humans has led to major advances in our understanding of human brain function but has also made it increasingly difficult to aggregate and synthesize neuroimaging findings. Here we describe and validate an automated brain-mapping framework that uses text-mining, meta-analysis and machine-learning techniques to generate a large database of mappings between neural and cognitive states. We show that our approach can be used to automatically conduct large-scale, high-quality neuroimaging meta-analyses, address long-standing inferential problems in the neuroimaging literature and support accurate 'decoding' of broad cognitive states from brain activity in both entire studies and individual human subjects. Collectively, our results have validated a powerful and generative framework for synthesizing human neuroimaging data on an unprecedented scale.

PubMed Disclaimer

Figures

Figure 1

Schematic overview of NeuroSynth framework and applications. (a) Schematic of NeuroSynth approach. The full text of a large corpus of articles is retrieved and terms of scientific interest are stored in a database. Articles are retrieved from the database based on a user-entered search string (e.g., the word ‘pain’), and peak coordinates from the associated articles are extracted from tables. A meta-analysis of the peak coordinates is automatically performed, producing a whole-brain map of the posterior probability of the term given activation at each voxel (i.e P(Pain|Activation)). (b) Two types of inference in brain imaging. Given a known psychological manipulation, one can quantify the corresponding changes in brain activity and generate a forward inference; however, given an observed pattern of activity, drawing a reverse inference about associated cognitive states is more difficult, because multiple cognitive states could have similar neural signatures. (c) Given meta-analytic posterior probability maps for multiple terms (e.g., working memory, emotion, pain), one can classify a new activation map by identifying the class with the highest probability P given the new data (in this example, pain).

Figure 2

Comparison of previous meta-analysis results with forward and reverse inference maps produced automatically using the NeuroSynth framework. Meta-analyses were carried out for working memory (top row), emotion (middle row), and physical pain (bottom row), and mapped to the PALS-B12 atlas. (a) Meta-analytic maps produced manually in previous studies-. (b) Automatically generated forward inference maps displaying the probability of observing activation given the presence of the term (i.e., P(Activation|Term)). (c) Automatically generated reverse inference maps display the probability of the term given observed activation (i.e., P(Term|Activation)). Thus, regions in (b) are consistently associated with the term, and regions in (c) are selectively associated with the term. To account for base differences in term frequencies, reverse inference maps assume uniform priors (i.e., equal 50% probabilities of Term and No Term). Activation in orange/red regions implies a high probability that a term is present, and activation in blue regions implies a high probability that a term is not present. Values for all images are displayed only for regions that are significant for a test of association between Term and Activation, with a whole-brain correction for multiple comparisons (FDR = .05). DLPFC = dorsolateral prefrontal cortex; dACC = dorsal anterior cingulate cortex; aI = anterior insula.

Figure 3

Comparison of forward and reverse inference in selected regions of interest. (a) Labeled regions of interest displayed on lateral and medial brain surfaces. (b) Comparison of forward inference (i.e., probability of activation given term P(T|A)) and reverse inference (probability of term given activation P(A|T)) for the domains of working memory (top), emotion (middle), and pain (bottom). Bars with asterisks denote statistically significant effects (whole-brain FDR, q = .05). dACC = dorsal anterior cingulate cortex (coordinates: +2, +8, +50); aIns = anterior insula (+36, +16, +2); IFJ = inferior frontal junction (−50, +8, +36); pIns = posterior insula (+42, −24, +24); aPFC = anterior prefrontal cortex (−28, +56, +8); vmPFC = ventromedial prefrontal cortex (0, +32, −4). L and R refer to the left and right hemispheres, respectively.

Figure 4

Three-way classification of working memory (WM), emotion, and pain. (a) Naive Bayes classifier performance when cross-validated on studies in the database (left) or applied to entirely new subjects (right). Sens. = sensitivity; Spec. = specificity. (b) Whole-brain maximum posterior probability map; each voxel is colored by the term with the highest associated probability. (c) Whole-brain maps displaying the proportion of individual subjects in the three pain studies (total n = 79) who showed activation at each voxel (P < .05, uncorrected), averaged separately for subjects who were correctly (n = 51; top row) or incorrectly (n = 28; bottom row) classified. Regions are color-coded according to the proportion of subjects in the sample who showed activation at each voxel.

Figure 5

Accuracy of the naive Bayes classifier when discriminating between all possible pairwise combinations of 25 key terms. Each cell represents a cross-validated binary classification between the intersecting row and column terms. Off-diagonal values reflect accuracy (in %) averaged across the two terms. Diagonal values reflect the mean classification accuracy for each term. Terms were ordered using the first two factors of a principal components analysis (PCA). All accuracy rates above 58% and 64% are statistically significant at P < .05 and P < .001, respectively.

Comment in

From journal articles to computational models: a new automated tool.
Mitchell TM. Mitchell TM. Nat Methods. 2011 Jul 28;8(8):627-8. doi: 10.1038/nmeth.1661. Nat Methods. 2011. PMID: 21799495 No abstract available.

References

1. Derrfuss J, Mar RA. Lost in localization: the need for a universal coordinate database. Neuroimage. 2009;48:1–7. -PubMed
1. Yarkoni T. Big Correlations in Little Studies: Inflated fMRI Correlations Reflect Low Statistical Power-Commentary on Vul et al. Perspect Psycholog Sci. 2009;4:294–298. 2009. -PubMed
1. Wager TD, Lindquist M, Kaplan L. Meta-analysis of functional neuroimaging data: current and future directions. Soc Cogn Affect Neurosci. 2007;2:150–158. -PMC -PubMed
1. Yarkoni T, Poldrack RA, Van Essen DC, Wager TD. Cognitive neuroscience 2.0: building a cumulative science of human brain function. Trends Cogn Sci. 2010;14:496–489. -PMC -PubMed
1. Van Horn JD, Grafton ST, Rockmore D, Gazzaniga MS. Sharing neuroimaging studies of human cognition. Nat Neurosci. 2004;7:473–481. -PubMed

Large-scale automated synthesis of human functional neuroimaging data - PubMed (original) (raw)