Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues - PubMed (original) (raw)

doi: 10.1186/gb-2008-9-9-r139. Epub 2008 Sep 19.

Reija Autio, Kalle Ojala, Kristiina Iljin, Elmar Bucher, Henri Sara, Tommi Pisto, Matti Saarela, Rolf I Skotheim, Mari Björkman, John-Patrick Mpindi, Saija Haapa-Paananen, Paula Vainio, Henrik Edgren, Maija Wolf, Jaakko Astola, Matthias Nees, Sampsa Hautaniemi, Olli Kallioniemi

Affiliations

Systematic bioinformatic analysis of expression levels of 17,330 human genes across 9,783 samples from 175 types of healthy and pathological tissues

Sami Kilpinen et al. Genome Biol. 2008.

Abstract

Our knowledge on tissue- and disease-specific functions of human genes is rather limited and highly context-specific. Here, we have developed a method for the comparison of mRNA expression levels of most human genes across 9,783 Affymetrix gene expression array experiments representing 43 normal human tissue types, 68 cancer types, and 64 other diseases. This database of gene expression patterns in normal human tissues and pathological conditions covers 113 million datapoints and is available from the GeneSapiens website.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Multidimensional scaling (MDS) of Q normalized data before and after AGC correction. MDS was performed using 1,137 healthy in vivo samples representing 15 tissue categories with 7,390 genes in common without missing values. Color codes show the array generation of each sample for panles on the left-hand side and the high level anatomical system from which samples originate for panels on the right-hand side. (a, b) Clustering of samples in Q normalized data without AGC correction. (a) Clustering driven dominantly by the array generations, but some biological division can be seen in the form of some division within the large clusters. (b) Several tissue classes are separated into two or more clusters due to the different array generation of origin. (c, d) After QAGC, array generations no longer define clusters (c) but instead tissue types form distinct clusters (d).

Figure 2

Figure 2

Boxplots of correlations between the replicated samples after each step of the data normalization process. All boxes for which notches do not overlap vertically have significantly (α = 0.05) different median values. On the left is a sample set from 14 human muscle biopsy samples measured with array generations U95Av2 and U133A. The correlations computed based on the QAGC-normalized data are significantly higher when compared to MAS5 and Q methods. On the right, all correlations between 123 leukemia samples are plotted. The samples are from three different array generations U95Av2, U133A, and U133B. The first column illustrates correlations between all replicates together (369 correlation values), and in the other columns the correlations are grouped based on the array generation pairs. When the mean values of the correlations computed with each method were compared, the values in the QAGC data were significantly higher.

Figure 3

Figure 3

Detailed expression profiles of TNNT2, ALPP and MAG. (a) TNNT2 is a clinically used cardiac biomarker and, as expected, it shows heart-specific expression. In addition, it has been shown that TNNT2 has elevated expression in some cases of rhabdomyosarcoma, also visible from the profile. (b) ALPP had high expression in placenta and somewhat elevated expression in uterine tumors. Additionally, serous ovarian tumors showed elevated expression when compared to the mucinous ones. (c) Known neuronal marker gene MAG similarly shows an expression profile that was highly central nervous system specific.

Figure 4

Figure 4

Detailed gene expression profile of PRAME. (a) Body-wide expression profile of the PRAME gene across the database. Each dot represents the expression of PRAME in one sample. Anatomical origins of each sample are marked with colored bars below the gene plot. Sample types having higher than average expression or an outlier expression profile are additionally colored in the figure (legend at the top left corner). The PRAME gene is a highly testis-specific gene in normal samples, but is ectopically expressed across the majority of human cancers. Gene plots like these can easily be used to identify outlier expression profiles, like as can be seen for kidney cancer in this case, where only a small fraction of the tumors are PRAME positive. (b) Box plot analysis of the PRAME expression levels across a variety of normal and cancer tissues. The number of samples in each category is shown in parentheses. Normal tissues are shown with green boxes and cancerous ones with red boxes. The box refers to the quartile distribution (25-75%) range, with the median shown as a black horizontal line. In addition, the 95% range and individual outlier samples are shown.

Figure 5

Figure 5

Body-wide expression map of known cancer genes. On the x-axis are 342 genes and on the y-axis are 110 in vivo tissues (both healthy and malignant) from human. The color indicates the mean expression value of each gene in each tissue. Grey color signifies missing values. Values have been gene-wise scaled (mean 0 and standard deviation 1). Both axes have been clustered by using Euclidean distance with complete linkage method. Below the expression map are gene-wise Pearson correlation coefficients with four known cellular process/tissue-specific marker genes (Ki-67, PCNA, KRT19 and PTPRC). Correlations have been calculated over 8,409 healthy and malignant samples using pairwise complete observations. Comparison of highest correlation values and clusters of genes on the expression map confirm that through the analysis of in silico transcriptomics data it is possible to find both tissue specificity and functional associations with processes such as cell cycle. For example, the orange colored branch contains genes having highest correlation with epithelial marker KRT19, branches colored blue contain genes mostly expressed in the hematological system and they also correlate with PTPRC, a marker for hematological tissues. Additionally, genes related to mitosis cluster together (purple branch), having highest correlations with Ki-67 and PCNA. The rectangles (A, B, C) highlight three genes as examples of extreme expression in some cancers (see Figure 6 and Additional data files 7 and 8 for enlargements of these areas).

Figure 6

Figure 6

Expression profile for the KIT gene shows interesting patterns in the bodymap in Figure 5. KIT exhibits extremely high expression in gastrointestinal stromal tumors. KIT is known to be inhibited by Gleevec®, demonstrating that findings like these pinpoint immediate possibilities for drug repositioning.

Similar articles

Cited by

References

    1. Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P. Coexpression analysis of human genes across many microarray datasets. Genome Res. 2004;14:1085–1094. doi: 10.1101/gr.1910904. - DOI - PMC - PubMed
    1. Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM. ONCOMINE: a cancer microarray database and integrated data-mining platform. Neoplasia. 2004;6:1–6. - PMC - PubMed
    1. Segal E, Shapira M, Regev A, Pe'er D, Botstein D, Koller D, Friedman N. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet. 2003;34:166–176. doi: 10.1038/ng1165. - DOI - PubMed
    1. Elo LL, Lahti L, Skottman H, Kylaniemi M, Lahesmaa R, Aittokallio T. Integrating probe-level expression changes across generations of Affymetrix arrays. Nucleic Acids Res. 2005;33:e193. doi: 10.1093/nar/gni193. - DOI - PMC - PubMed
    1. Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, Griffin C, Hilmer SC, Hoffman E, Jedlicka AE, Kawasaki E, Martinez-Murillo F, Morsberger L, Lee H, Petersen D, Quackenbush J, Scott A, Wilson M, Yang Y, Ye SQ, Yu W. Multiple-laboratory comparison of microarray platforms[see comment][erratum appears in Nat Methods. 2005 Jun;2(6):477]. Nat Methods. 2005;2:345–350. doi: 10.1038/nmeth756. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources