Mixture modeling of transcript abundance classes in natural populations - PubMed (original) (raw)

Mixture modeling of transcript abundance classes in natural populations

Wen-Ping Hsieh et al. Genome Biol. 2007.

Abstract

Background: Populations diverge in genotype and phenotype under the influence of such evolutionary processes as genetic drift, mutation accumulation, and natural selection. Because genotype maps onto phenotype by way of transcription, it is of interest to evaluate how these evolutionary factors influence the structure of variation at the level of transcription. Here, we explore the distributions of cis-acting and trans-acting factors and their relative contributions to expression of transcripts that exhibit two or more classes of abundance among individuals within populations.

Results: Expression profiling using cDNA microarrays was conducted in Drosophila melanogaster adult female heads for 58 nearly isogenic lines from a North Carolina population and 50 from a California population. Using a mixture modeling approach, transcripts were identified that exhibit more than one mode of transcript abundance across the samples. Power studies indicate that sample sizes of 50 individuals will generally be sufficient to detect divergent transcript abundance classes. The distribution of transcript abundance classes is skewed toward low frequency minor classes, which is reminiscent of the typical skew in genotype frequencies. Similar results are observed in reported data on gene expression in human lymphoblast cell lines, in which analysis of association with linked polymorphisms implies that cis-acting single nucleotide polymorphisms make only a modest contribution to bimodal distributions of transcript abundance.

Conclusion: Population surveys of gene expression may complement genetical genomics as a general approach to quantifying sources of transcriptional variation. Differential expression of transcripts among individuals is due to a complex interplay of cis-acting and trans-acting factors.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Two-way hierarchical clustering of abundance of all transcripts in NC and CA samples. The heat map indicates relatively high abundance in magenta and low abundance in blue, with each row corresponding to one gene and each column one line of flies. Thick bars to the right indicate genes that appear to differentiate the NC and CA samples, whereas the thin bars highlight genes that have polymorphic expression in both samples. CA, California; NC, North Carolina.

Figure 2

Figure 2

Line means for two typical transcripts across the NC sample. Each plot shows the mean relative fluorescence intensity on a log base-2 scale for the four samples (two control and two nicotine-treated) of each line in random order (± 1 standard deviation unit). (a) CG7843 (unknown gene that is predicted to be involved in defense/toxin response) is an example of a gene with bimodal abundance, with the minor transcript abundance class centered approximately fourfold more abundant than the average transcript on the array (relative fluorescence intensity = +2), and the major transcript abundance class (TAC) twofold less abundant than the average (relative fluorescence intensity = -1). (b) CG12141 (encoding Lysyl tRNA synthetase) is a gene with a single mode of transcript abundance, given the variance among and within lines.

Figure 3

Figure 3

Six examples of bimodal TACs in both populations. Each plot shows the frequency distribution in the North Carolina (NC) sample (solid curve) and California (CA) sample (dashed curve). Units along the x-axis are log base-2 relative fluorescence intensity after mixed model normalization. The top two rows show transcripts with similar distributions in both populations. The bottom two rows show two transcripts with apparently different distributions in NC and California (CA), both encoding larval serum proteins. TAC, transcript abundance class.

Figure 4

Figure 4

Parameters of bimodal transcription abundance classes in Drosophila by population. (a, b) Histograms of magnitude of differences between modes of the two transcript abundance classes (TACs), on a log base-2 scale, in North Carolina (NC) and California (CA), respectively. In both populations the median difference is between 1.5-fold and 2-fold, but a few transcripts exhibit differences as great as 16-fold. (c) Histograms of observed (solid bars) and inferred (open bars) minor TAC frequencies in the NC sample. (d) Histogram of observed distribution of minor TAC frequencies in the CA sample, relative to expected minor single nucleotide polymorphism frequencies under the Ewens sampling distribution, with the population parameter θ (that is, 4Nμ) equalling 0.05 (red line), 0.10 (blue line), or 0.20. The two curves for the most part lie within the range of expected values for D. melanogaster defined by the red and blue curves, although there is a slight excess of minor transcript frequencies between 5% and 10%.

Figure 5

Figure 5

Power studies. (a) Percent detection rate as a function of the difference between the modes of the two transcript abundance classes, for minor transcript abundance class (TAC) frequencies of 0.05 (left) and 0.5 (right). Colors represent increasing sample size, from 30 lines (red) to 40 (blue), 50 (green), 70 (blue-green), 90 (orange), or 100 (light blue) lines. Power of 80% is obtained for 100 lines if the modes differ by more than 1.7-fold (1.75 log base-2 units), and 40 lines if they differ by more than 2-fold. Thirty lines is too few to perform this type of analysis. (b) Percentage detection rates as a function of minor TAC proportion, for four different values of the difference between median expression value of each class. Power drops quickly for minor TACs less than 10% of the sample, but it is fairly constant for all other relative abundances of the two classes.

Figure 6

Figure 6

Transcript abundance classes in human cell lines. (a) The frequency distribution of transcript abundance classes (TACs) in the Centre d'Etude du Polymorphisme Humain data for 831 bimodally expressed genes. Open bars show the detected frequency of transcripts in each bin, and solid bars the reconstituted distribution adjusted for the false-negative detection rate for each bin. (b) The distribution of genotype frequencies for single nucleotide polymorphism (SNP) within 100 kilobases of each of the 831 transcripts that shows the strongest association with transcript abundance. Genotype is represented as the lesser of the common homozygote class or the sum of the heterozygotes and less common homozygote classes. This distribution is therefore right-shifted relative to the minor allele frequency distribution (and selection of SNPs with strong association statistics also biases the analysis toward common SNPs).

Figure 7

Figure 7

Strength of association between _cis_-SNPs and transcript abundance. Frequency histograms in bins of increasing order of magnitude of significance, with number of genes indicated on the y-axis. (a) The distribution of significance measures (negative logarithm of the P value) for the most strongly associated single nucleotide polymorphism (SNP) within 100 kilobases of each of the 818 bimodally expressed transcripts. (b) The same distribution for SNPs linked to a set of 835 randomly selected transcripts. Note the excess of outliers in the bimodal sample. (c) The distribution of strongest associations for a typical permutation of SNPs against unlinked transcripts, clearly showing much reduced significance relative to those observed for linked SNPs. (d) The 'best possible' distribution of associations, assuming that a single SNP explains all of the observed bimodality of each transcript. Single dots in panels A and D represent outlier significance values.

Similar articles

Cited by

References

    1. Ewens W. A hundred years of population genetics theory. J Epidemiol Biostat. 2000;5:17–23. - PubMed
    1. Ohta T, Gillespie JH. Development of neutral and nearly neutral theories. Theor Popul Biol. 1996;49:128–142. doi: 10.1006/tpbi.1996.0007. - DOI - PubMed
    1. Orr HA. The genetic theory of adaptation: a brief history. Nat Rev Genet. 2005;6:119–127. doi: 10.1038/nrg1523. - DOI - PubMed
    1. Monks SA, Leonardson A, Zhu H, Cundiff P, Pietrusiak P, Edwards S, Phillips JW, Sachs A, Schadt EE. Genetic inheritance of gene expression in human cell lines. Am J Hum Genet. 2004;75:1094–1105. doi: 10.1086/426461. - DOI - PMC - PubMed
    1. Cheung VG, Conlin LK, Weber TM, Arcaro M, Jen K-Y, Morley M, Spielman RS. Natural variation in human gene expression assessed in lymphoblastoid cells. Nat Genet. 2003;33:422–425. doi: 10.1038/ng1094. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources