GeneNetwork: A Toolbox for Systems Genetics - PubMed (original) (raw)

GeneNetwork: A Toolbox for Systems Genetics

Megan K Mulligan et al. Methods Mol Biol. 2017.

Abstract

The goal of systems genetics is to understand the impact of genetic variation across all levels of biological organization, from mRNAs, proteins, and metabolites, to higher-order physiological and behavioral traits. This approach requires the accumulation and integration of many types of data, and also requires the use of many types of statistical tools to extract relevant patterns of covariation and causal relations as a function of genetics, environment, stage, and treatment. In this protocol we explain how to use the GeneNetwork web service, a powerful and free online resource for systems genetics. We provide workflows and methods to navigate massive multiscalar data sets and we explain how to use an extensive systems genetics toolkit for analysis and synthesis. Finally, we provide two detailed case studies that take advantage of human and mouse cohorts to evaluate linkage between gene variants, addiction, and aging.

Keywords: Allen Brain Atlas; BioGPS; GEO; GTEx; GWAS Catalog; Gemma; GeneRIF; GeneWeaver; Interval mapping; Manhattan plot; Metabolomics; Metagenomics; NCBI; PLINK; Pair scan; Principal component analysis; Proteomics; R/qtl; Recombinant inbred strain; Reverse genetics; Test cross; UCSC Genome Browser; WGCNA; WebGestalt; WebQTL; dbSNP; eQTL analysis.

PubMed Disclaimer

Figures

Fig. 1

Fig. 1

Organization of data sets in GeneNetwork

Fig. 2

Fig. 2

GeneNetwork main search page and organization. Most analyses in GeneNetwork will follow the steps shown in panels A through D. In this workfl ow, a data set is selected (A) and mined for traits of interest based on user search queries (B). Traits are then selected from the search (C) and placed in a collection for further inspection and quantitative analysis (D). The banner menu contains additional search options and helpful resources under the Search and Help tab, respectively (E)

Fig. 3

Fig. 3

Local or distant modulation of gene expression in the hippocampus of BXD strains. QTL maps are shown for Alad and Atf4 in the top and bottom panels with the association score (LOD) plotted on the Y axis across the genome (_X_-axis). Chromosomes and megabase position are shown at the top and bottom of the graph, respectively. Expression of Alad is modulated by a local cis-eQTL whereas expression of Atf4 is modulated by a distant trans-eQTL. The sequence variant underlying expression of Alad is actually a copy number variant such that the parental DBA/2J strain and BXD strains that have inherited the D allele at this locus have additional copies of the gene and higher expression (indicated by the green line associated with the QTL peak in blue). The expression of Atf4 is modulated from a distal region on Chr 1. BXD strains that have inherited the B allele from the C57BL/6J parent at the Chr 1 locus have higher expression of Atf4. This distal region on Chr 1 (often referred to as QTL rich region 1 or QRR1) is a major regulatory locus of many expression and behavioral traits. The additive effect is shown in green to the right. The expression data can be accessed using Mouse Species : Mouse, Group : BXD Phenotypes, Type : BXD Data Set: Hippocampus Consortium M430v2 (Jun06) RMA and entering the probe set IDs in the Get Any search option

Fig. 4

Fig. 4

Overview of Search Results page. Panel A indicates actions and panel B shows indexed search results. Number of records that match search term are shown in the Details and Links section at the top of the page. Note that this page was generated using the Mouse (Species), BXD (Group) Phenotypes (Type) BXD Published Phenotypes Data Set and entering the wild card character (asterisk) using the Get Any option. Summarized information for each trait varies based on data set type but, in general, Record ID gives a unique identifi er for each data set, (e.g. a number for phenotype data sets and a probe set identifi er for expression data sets), Max LRS and MAX LRS Location Chr and Mb give the maximum association score for each trait, and associated peak chromosome and megabase position, respectively. Add gives the additive allele effect, which is the estimated effect on trait expression associated with inheritance of the maternal or paternal allele. Positive or negative values indicate higher or lower expression associated with inheritance of the paternal or maternal allele, respectively. From the Search Results page additional information about individual traits can be accessed by clicking the Record ID. Multiple traits can be selected (or deselected) using the actions options Select, Deselect, and Invert. Selected traits can be added to a Trait Collection for further analysis using the Add option. The red question marks are links to additional information about column headings

Fig. 5

Fig. 5

Overview of the Trait Collection page. Panel A shows the actions tools menu with each action or tool represented by a clickable icon. Panel B shows the indexed search results. Note that additional columns of data are shown for traits in a collection compared to traits in the Search Results page, including Dataset, Symbol, Description, Location, Mean, and N Cases. The Dataset and Description column provide information about which data set the trait originated from and details about the trait itself. As multiple different types of data can be added to the same Group collection it is useful to keep track of which data set the trait originated from, especially if exploring the expression of the same gene across tissue types. For phenotype data sets, detailed descriptions are provided about trait measurement and for gene expression data sets, the full gene name is given along with information about the probe set used to measure the expression of that gene. The Symbol column gives the gene symbol for expression data sets and an abbreviated name for phenotypes. Location and Mean give the location of the gene for expression data sets and average trait expression, respectively. N Cases shows the number of individuals that were included in the trait measurement. The red question marks are links to additional information about column headings

Fig. 6

Fig. 6

Layout of Trait Data and Analysis page. Users can explore individual traits in detail in the Trait Data and Analysis page. In the Details and Links track, a full description of the trait and associated actions and tools are shown. Actions and tools vary slightly depending on whether the trait is from a phenotype (A) or gene expression (B) Data Set. The results in B can be generated by selecting Mouse (Species), BXD (Group), Hippocampus mRNA (Type), Hippocampus Consortium M430v2 (Jun06) RMA (Data Set) and entering the gene symbol “ Bdnf ” using the Get Any option. Multiple links to outside resources (shown as Resource Links) are provided for gene expression data in addition to the GeneNetwork actions and tools Add, Find, Verify, GeneWiki, SNPs, RNA-seq, and Probes. Both traits have a common set of tools shown in Panel C as the Basic Statistics, Calculate, Correlations, and Mapping Tools tracks. Each track gives the user options to graph the trait distribution, correlate expression of the trait with all other traits in a Data Set from the same Group, or perform QTL mapping for the trait, respectively. Actual trait values are shown in the Review and Edit Data track

Fig. 7

Fig. 7

Exploring the function of Rb1. An unusual use of the term addiction in NCBI GeneRIF lead to the inclusion of Rb1 in our search for addiction-related genes whose expression is modulated by a strong cis-eQTL

Fig. 8

Fig. 8

Probe set quality control. The RNA-seq button performs alignment of a probe set sequence against the appropriate reference genome using UCSC Genome Browser ‘s BLAST-like alignment tool (BLAT). The results are shown for probe set 1450486_a_at in the top panel. The SCORE is a function of the size and match. For large sequences a perfect score is 255. START, END, and QSIZE provide information about the size in base pairs of the query sequence. IDENTITY provides information about the match with 100 % indicating a perfect match to the reference C57BL/6J genome. The location and span of the match are given by CHRO (chromosome) STRAND, START, END, and SPAN. Note that both the probe set and the 11 perfect match probes that comprise the probe set are shown and that the best match for the individual probes and entire probe set is on the positive strand on Chr 2 around 181.45 Mb. Clicking the browser link for the best match directs to a graphical display of the probe set alignment, shown in the bottom panel. The genome browser display can be cluttered for the uninitiated. The basic layout is a display of several different Tracks of information. These tracks can be modifi ed by scrolling down to the track tables at the bottom of the page. The display in the above panel was generated by selecting the hide option for all tracks EXCEPT the Mapping and Sequencing, Genes and Gene Prediction, and the DBA/2J Sequence and Structural Variation tracks. The position of all 11 probes and the composite probe set are shown in the bottom panel in black with the corresponding IDs shown to the left. The arrowheads designate the alignment of the probe set on the positive (or sense) strand. The targeted gene (Oprl1) is shown below and indicates that the probe set is designed to target the 3′ UTR according to the UCSC gene model. The location of sequence variants in the DBA/2J strain relative to the C57BL/6J reference genome are shown in the last two tracks (D2 InDels and D2 SNPs). Note probes 299709 and 452573 overlap a DBA/2J SNP

Fig. 9

Fig. 9

Impact of variants overlapping probe sets in microarray data sets. SNPs overlapping Oprl1 probe set 1450486_a_at (perfect match or PM probes 299709 and 452573) lead to expression measurements that are higher in BXD strains that have inherited the B allele and lower in strains that have inherited the D allele. The QTL Heatmap reveals a strong eQTL with higher expression associated with inheritance of the B allele at the Oprl1 locus (blue) only for the probes that overlap SNPs. The arrowhead indicates the genomic position of the probes. No other probes demonstrate a strong association between inheritance of alleles at this locus and gene expression. This analysis reveals that the strong cis-eQTL detected for Oprl1 is actually the result of a technical artifact resulting from sequence variants that disrupt the hybridization of probes to their target RNA sequence in strains other than the reference B6 strain (in this case the D2 strain)

Fig. 10

Fig. 10

Top cis-modulated genes associated with addiction

Fig. 11

Fig. 11

Exploring covariation. The matrix function allows users to investigate covariation between genes (or probe sets) in the Trait Collection. To display the gene symbols along with the probe set IDs, use the Short Labels button to redraw the correlation matrix. The matrix displays the correlation for each pair of genes (or probe sets) with the spearman correlation coeffi cient shown to the right of the diagonal and the Pearson Correlation Coeffi cient shown to the left (the diagonal is indicated by grey shading and would normally be represented as a 1, or the correlation of each probe set with itself). Scatterplots can be generated by clicking the correlation in the matrix. The scatterplot can be customized by selecting the Show Options icon, adjusting the settings, and replotting

Fig. 12

Fig. 12

Principal component analysis (PCA). As part of the matrix tool, a PCA is performed on the selected traits. The Scree Plot (left panel) plots each principal component (PC) based on the amount of variance each PC or factor explains. The Factor Loadings Plot displays the loading (the correlation) between each treat (the measured variable) and the factor or PC (latent variable). Each PC can be treated as a trait. If selected the same basic functions and tools for individual trait analysis can be used for the PC. QTL mapping is shown for PC1 in the top right panel. Interval mapping does not suggest strong genetic control originating from a single locus for PC1

Fig. 13

Fig. 13

Creating networks and analysis of biological enrichment. From the Trait Collection a network graph depicting relations between gene set members can be constructed using the Graph tool. Display and correlation threshold can be adjusted using the Network Graph interface. Each node represents a gene (probe set) and the edge indicates the correlation (green for negative correlations and red for positive correlations). In this case the network shown in A was given a threshold of r = |0.3| as this represents a signifi cant correlation (p < −0.01) in this data set. Based on the network, a subset of genes (shown in the yellow panel in B) can be selected for enrichment analysis. Select the subset in the Trait Collection and select the Gene Set tool. Enrichment analysis is shown in the background (C), with signifi cant (adjusted _p_-value or AdjP < 0.05) enrichment of biological function (based on GO annotations) shown in red

Fig. 14

Fig. 14

Correlation table and correlation scatter plot. (A) The Correlation Table displays the results of a correlation analysis between a trait or data of interest and other traits collected from the same cohort. In this case, the correlation analysis is between the demographic age data and gene expression in the liver. (B) Individual scatter plots can be displayed by clicking on correlation values found in the Sample r column. This example shows a signifi cant negative correlation between the expression of a mitochondrial ribosomal protein gene, MRPL9, and age. (C) Users can select transcripts in the table by setting the correlation criteria using AND/OR operators

Fig. 15

Fig. 15

Biological enrichment and network analysis. Gene lists can be sent directly from gene network to other external websites for (A) Gene Ontology, and (B) functional network analysis

Fig. 16

Fig. 16

Manhattan Plots. Basic genetic association test is performed within GeneNetwork using PLINK and result is displayed as a standard Manhattan plot. Comparing between the GWAS results for the (top) CYP2C8 enzyme activity (Record ID 10015), and (bottom) expression of CYP2C8 gene in liver (GSE9588 Human Liver Normal (Mar11) Both Sexes : 10033668843), we find no common genetic modulator of the two related traits

Similar articles

Cited by

References

    1. Manly KF, Olson JM (1999) Overview of QTL mapping software and introduction to map manager QT. Mamm Genome 10(4):327–334 - PubMed
    1. Williams RW (1994) The portable dictionary of the mouse genome: a personal database for gene mapping and molecular biology. Mamm Genome 5(6):372–375 - PubMed
    1. Chesler EJ, Lu L, Shou S, Qu Y, Gu J, Wang J, Hsu HC, Mountz JD, Baldwin NE, Langston MA, Threadgill DW, Manly KF, Williams RW (2005) Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat Genet 37(3):233–242. doi: 10.1038/ng1518 - DOI - PubMed
    1. Andreux PA, Williams EG, Koutnikova H, Houtkooper RH, Champy MF, Henry H, Schoonjans K, Williams RW, Auwerx J (2012) Systems genetics of metabolism: the use of the BXD murine reference panel for multiscalar integration of traits. Cell 150(6):1287–1299. doi: 10.1016/j.cell.2012.08.012 - DOI - PMC - PubMed
    1. Chesler EJ, Wang J, Lu L, Qu Y, Manly KF, Williams RW (2003) Genetic correlates of gene expression in recombinant inbred strains: a relational model system to explore neurobehavioral phenotypes. Neuroinformatics 1(4):343–357. doi: 10.1385/NI:1:4:343 - DOI - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources