Variation and genetic control of protein abundance in humans - PubMed (original) (raw)

Variation and genetic control of protein abundance in humans

Linfeng Wu et al. Nature. 2013.

Abstract

Gene expression differs among individuals and populations and is thought to be a major determinant of phenotypic variation. Although variation and genetic loci responsible for RNA expression levels have been analysed extensively in human populations, our knowledge is limited regarding the differences in human protein abundance and the genetic basis for this difference. Variation in messenger RNA expression is not a perfect surrogate for protein expression because the latter is influenced by an array of post-transcriptional regulatory mechanisms, and, empirically, the correlation between protein and mRNA levels is generally modest. Here we used isobaric tag-based quantitative mass spectrometry to determine relative protein levels of 5,953 genes in lymphoblastoid cell lines from 95 diverse individuals genotyped in the HapMap Project. We found that protein levels are heritable molecular phenotypes that exhibit considerable variation between individuals, populations and sexes. Levels of specific sets of proteins involved in the same biological process covary among individuals, indicating that these processes are tightly regulated at the protein level. We identified cis-pQTLs (protein quantitative trait loci), including variants not detected by previous transcriptome studies. This study demonstrates the feasibility of high-throughput human proteome quantification that, when integrated with DNA variation and transcriptome information, adds a new dimension to the characterization of gene expression regulation.

PubMed Disclaimer

Figures

Fig. 1

Fig. 1. Overview of workflow and protein association with ethnicity

a) Flow chart of experimental scheme. In each experiment, peptide digests from a reference cell line (GM12878) and five other cell lines were each labeled with one of the TMT-sixplex tags. Labeled peptides were equally mixed and subjected to identification and quantification by mass spectrometry, and then used for protein quantification. A total of 51 experiments were performed. b) The P value distribution for the difference in protein levels between CEU and YRI shows enrichment at small P values. c) P value of protein level differences between CEU and YRI plotted as a function of the genomic coordinate for each protein. The dashed line is at significance threshold Bonferroni P = 0.05. All the proteins that passed the threshold are highlighted with larger dots and labeled with gene names. Proteins that differed between CEU and YRI are distributed throughout the genome.

Fig. 2

Fig. 2. Protein covariation network generated by sparse partial correlation estimation

Nodes represent proteins. Edges represent connection by covariation. This sparse network displays the 223 strongest connections among 278 proteins. Protein function was annotated by node color. Edge color was categorized according to correlation value. Known protein-protein interacting pairs were highlighted in larger nodes and labeled with gene names.

Fig. 3

Fig. 3. Loci associated with protein expression levels

a) Identification of cis-pQTLs in all three populations combined (n=72). The P value and genomic coordinates for each protein/cis-SNP association test were plotted in the Manhattan plot. pQTLs with max(T) corrected P value < 0.001 were highlighted with a bigger dot size and a black outline. Multiple loci throughout the genome displayed an excess of small P values. Arrow indicates the location of the IMPA1 gene which contains a significant cis-pQTL. b) Overview of IMPA1 protein level and SNP genotype association in CEU, YRI, and all populations combined. The bottom plot is the fine mapping of cis-pQTL for IMPA1 based on HapMap I, II and III genotypes release 28. Each dot represents a tested SNP. Dot colors represent testing groups. The arrow is indicative of the chromosome location and transcription direction of the IMPA1 gene. There are several highly significant associations near the IMPA1 region in CEU and all populations combined. The exact locations of these associations in the IMPA1 gene region are illustrated in the top plot. The most significant SNP is rs1058401, located in IMPA1 3′UTR. c) Validation of IMPA1 protein expression level. IMPA1 protein expression level was validated by immunoblotting in 11 CEU individuals, with their genotype at rs1058401 labeled at the bottom. d) The bar plots show the mean of IMPA1 protein level of these 11 individuals in each rs1058401 genotype, based on data measured by quantitative mass spectrometry and by densitometry of immunoblot blots. Error bar, standard error of the mean. M.S., mass spectrometry. Im., immunoblotting.

Similar articles

Cited by

References

    1. Stranger BE, et al. Population genomics of human gene expression. Nature Genet. 2007;39:1217–1224. - PMC - PubMed
    1. Montgomery SB, et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature. 2010;464:773–777. - PMC - PubMed
    1. Pickrell JK, et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010;464:768–772. - PMC - PubMed
    1. Stranger BE, et al. Patterns of cis regulatory variation in diverse human populations. PLoS Genet. 2012;8:e1002639. - PMC - PubMed
    1. Kasowski M, et al. Variation in transcription factor binding among humans. Science. 2010;328:232–235. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources