Systems Analysis of Seed Filling in Arabidopsis: Using General Linear Modeling to Assess Concordance of Transcript and Protein Expression (original) (raw)
Journal Article
,
Department of Biochemistry and Interdisciplinary Plant Group (M.H., J.A.M., J.E.C., G.K.A., J.J.T.), Department of Statistics (L.B.H.), Plant Genetics Research Unit, United States Department of Agriculture Agricultural Research Service (J.A.M.), Computer Science Department (T.J., Z.S., D.X.), and DNA Core Facility (M.Z.), Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
Search for other works by this author on:
,
Department of Biochemistry and Interdisciplinary Plant Group (M.H., J.A.M., J.E.C., G.K.A., J.J.T.), Department of Statistics (L.B.H.), Plant Genetics Research Unit, United States Department of Agriculture Agricultural Research Service (J.A.M.), Computer Science Department (T.J., Z.S., D.X.), and DNA Core Facility (M.Z.), Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
Search for other works by this author on:
,
Department of Biochemistry and Interdisciplinary Plant Group (M.H., J.A.M., J.E.C., G.K.A., J.J.T.), Department of Statistics (L.B.H.), Plant Genetics Research Unit, United States Department of Agriculture Agricultural Research Service (J.A.M.), Computer Science Department (T.J., Z.S., D.X.), and DNA Core Facility (M.Z.), Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
Search for other works by this author on:
,
Department of Biochemistry and Interdisciplinary Plant Group (M.H., J.A.M., J.E.C., G.K.A., J.J.T.), Department of Statistics (L.B.H.), Plant Genetics Research Unit, United States Department of Agriculture Agricultural Research Service (J.A.M.), Computer Science Department (T.J., Z.S., D.X.), and DNA Core Facility (M.Z.), Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
Search for other works by this author on:
,
Department of Biochemistry and Interdisciplinary Plant Group (M.H., J.A.M., J.E.C., G.K.A., J.J.T.), Department of Statistics (L.B.H.), Plant Genetics Research Unit, United States Department of Agriculture Agricultural Research Service (J.A.M.), Computer Science Department (T.J., Z.S., D.X.), and DNA Core Facility (M.Z.), Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
Search for other works by this author on:
,
Department of Biochemistry and Interdisciplinary Plant Group (M.H., J.A.M., J.E.C., G.K.A., J.J.T.), Department of Statistics (L.B.H.), Plant Genetics Research Unit, United States Department of Agriculture Agricultural Research Service (J.A.M.), Computer Science Department (T.J., Z.S., D.X.), and DNA Core Facility (M.Z.), Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
Search for other works by this author on:
,
Department of Biochemistry and Interdisciplinary Plant Group (M.H., J.A.M., J.E.C., G.K.A., J.J.T.), Department of Statistics (L.B.H.), Plant Genetics Research Unit, United States Department of Agriculture Agricultural Research Service (J.A.M.), Computer Science Department (T.J., Z.S., D.X.), and DNA Core Facility (M.Z.), Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
Search for other works by this author on:
,
Department of Biochemistry and Interdisciplinary Plant Group (M.H., J.A.M., J.E.C., G.K.A., J.J.T.), Department of Statistics (L.B.H.), Plant Genetics Research Unit, United States Department of Agriculture Agricultural Research Service (J.A.M.), Computer Science Department (T.J., Z.S., D.X.), and DNA Core Facility (M.Z.), Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
Search for other works by this author on:
,
Department of Biochemistry and Interdisciplinary Plant Group (M.H., J.A.M., J.E.C., G.K.A., J.J.T.), Department of Statistics (L.B.H.), Plant Genetics Research Unit, United States Department of Agriculture Agricultural Research Service (J.A.M.), Computer Science Department (T.J., Z.S., D.X.), and DNA Core Facility (M.Z.), Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
Search for other works by this author on:
Department of Biochemistry and Interdisciplinary Plant Group (M.H., J.A.M., J.E.C., G.K.A., J.J.T.), Department of Statistics (L.B.H.), Plant Genetics Research Unit, United States Department of Agriculture Agricultural Research Service (J.A.M.), Computer Science Department (T.J., Z.S., D.X.), and DNA Core Facility (M.Z.), Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, Missouri 65211
Search for other works by this author on:
This work was supported by the National Science Foundation-Plant Genome Research Program Young Investigator Award (grant no. DBI–0332418).
Present address: Institute of Plant Genetics and Biotechnology, Slovak Academy of Sciences, 950 07 Nitra, Slovak Republic.
Present address: Research Laboratory for Biotechnology and Biochemistry, GPO Box 8207, Kathmandu, Nepal.
The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Jay J. Thelen (thelenj@missouri.edu).
Some figures in this article are displayed in color online but in black and white in the print edition.
The online version of this article contains Web-only data.
Open Access articles can be viewed online without a subscription.
Received:
21 December 2009
Accepted:
26 January 2010
Published:
29 January 2010
Cite
Martin Hajduch, Leonard B. Hearne, Jan A. Miernyk, Jill E. Casteel, Trupti Joshi, Ganesh K. Agrawal, Zhao Song, Mingyi Zhou, Dong Xu, Jay J. Thelen, Systems Analysis of Seed Filling in Arabidopsis: Using General Linear Modeling to Assess Concordance of Transcript and Protein Expression , Plant Physiology, Volume 152, Issue 4, April 2010, Pages 2078–2087, https://doi.org/10.1104/pp.109.152413
Close
Navbar Search Filter Mobile Enter search term Search
Abstract
Previous systems analyses in plants have focused on a single developmental stage or time point, although it is often important to additionally consider time-index changes. During seed development a cascade of events occurs within a relatively brief time scale. We have collected protein and transcript expression data from five sequential stages of Arabidopsis (Arabidopsis thaliana) seed development encompassing the period of reserve polymer accumulation. Protein expression profiling employed two-dimensional gel electrophoresis coupled with tandem mass spectrometry, while transcript profiling used oligonucleotide microarrays. Analyses in biological triplicate yielded robust expression information for 523 proteins and 22,746 genes across the five developmental stages, and established 319 protein/transcript pairs for subsequent pattern analysis. General linear modeling was used to evaluate the protein/transcript expression patterns. Overall, application of this statistical assessment technique showed concurrence for a slight majority (56%) of expression pairs. Many specific examples of discordant protein/transcript expression patterns were detected, suggesting that this approach will be useful in revealing examples of posttranscriptional regulation.
One aim of systems biology is in developing an understanding of the complexity of living organisms by acquisition, integration, and interpretation of the information present in large omics datasets (Ilsley et al., 2009). In this regard, global methods for comparative transcript and protein profiling can be performed to discover posttranscriptionally regulated genes. One of the earliest global comparisons of transcript and protein abundance in eukaryotes revealed a weak statistical correlation (Gygi et al., 1999), suggesting that protein expression deviates from its cognate transcript more often than generally assumed. Subsequent studies have shown that the correlation between protein and transcript expression levels can vary between 20% and 70%, based upon the profiling approach used and the system being analyzed (Chen et al., 2002; Griffin et al., 2002; Ørntoft et al., 2002; Le Roch et al., 2004; Shankavaram et al., 2007; Jayapal et al., 2008; Pascal et al., 2008; Hornshøj et al., 2009). It has become increasingly clear that the various posttranscriptional mechanisms operating within a cell can substantially change, and thus regulate, steady-state protein levels. In some cases the regulation can result in patterns much different from those predicted from transcript profiling alone (Shang and Lehrman, 2004; Shendure, 2008; Hendrickson et al., 2009). Discordance between transcript and protein levels can make it difficult to answer important biological questions based upon measurement of transcript levels alone (Piques et al., 2009). Thus, an improved strategy for assessing correlation between transcript and protein levels should be broadly informative.
Nonparametric statistical tests have been previously used for pairwise comparisons of transcript/protein abundances. The Pearson product moment correlation (PPMC) was applied for analysis of yeast (Saccharomyces cerevisiae; Gygi et al., 1999) and prostate cells (Pascal et al., 2008), and the Spearman rank order correlation (SROC) has been applied to analysis of yeast (Griffin et al., 2002) and Plasmodium falciparum (Le Roch et al., 2004). There have, however, been fewer parallel time-index studies of any biological process or response that have included quantifying proteome and transcriptome coordination (Prioul et al., 2008; Tian et al., 2009). As a result the methods for statistical description of multidatapoint trends and the degree of agreement between such datasets have not been well explored. In order for profiling studies to address the kinetic aspects of biological responses, improved statistical applications will be necessary. Herein we present general linear modeling (GLM) as an approach useful for detecting concordance\discordance in the patterns of transcript and protein expression during Arabidopsis (Arabidopsis thaliana) seed development. A comparison of the results from application of GLM versus simple correlation coefficient analysis of the transcript and protein expression datasets reveals the latter to be inadequate for assessing complex biological trends.
Seeds undergo a rapid, lineal transformation from fertilized embryos to mature propagules. This developmental sequence can be separated into three distinct phases: embryogenesis, seed filling, and maturation (Goldberg et al., 1994). Seed filling is particularly interesting because it is the period of massive storage reserve (oil, protein, and starch) synthesis and deposition (Baud et al., 2009; Andriotis et al., 2010). It is well known that both protein (Hajduch et al., 2005, 2006; Agrawal et al., 2008) and transcript levels (Ruuska et al., 2002; Le et al., 2007) change dramatically during seed filling, although in no case has there been parallel comparative global profiling of both. While it is clearly important to perform parallel coincidental global profiling of transcript and protein expression, it is also important that data analysis incorporate a robust statistical approach capable of providing confidence assessments for the entire dataset. Ideally, the strategy for statistical analysis would simultaneously provide insight into the mechanisms of posttranscriptional regulation. The use of GLM in our analyses allows us to assign confidence values to our conclusions, and at the same time to identify outliers that might provide insight into the underlying mechanisms.
RESULTS
Using Fatty Acid Analysis as a Marker for the Stages of Seed Filling
Developing Arabidopsis seeds were harvested at 5, 7, 9, 11, or 13 d after flowering (DAF). Ten different fatty acids (FAs) were detectable by gas chromatography (GC), and their distribution during seed filling was quantified (Supplemental Table S1). Total FA levels increased linearly from 5 through 11 DAF, with a 2-fold increase between 11 and 13 DAF, at which point FAs comprised 20% of the seed dry mass (Fig. 1). Linoleic acid (18:2) levels steadily increased during seed development, and this was the most prominent FA at all developmental stages. Linolenic acid levels also increased steadily throughout seed filling showing a 3-fold increase between 11 and 13 DAF. Eicosanoic (20:0), 11-eicosenoic (20:1Δ11), 13-eicosenoic (20:1Δ13), and erucic (22:1Δ13) acid levels increased approximately 3.6-, 5.0-, 2.3-, and 4.9-fold between 11 and 13 DAF, suggesting that the activity of cytoplasmic fatty-acyl-CoA elongase might be temporally regulated. Total protein levels also increased steadily between 5 and 13 DAF (Fig. 1).
Figure 1.
Characterization of developing Arabidopsis seeds. A, Seeds staged at 5, 7, 9, 11, and 13 DAF. B, FA content of developing seeds as determined by GC-MS analysis of methyl ester-derivatized FAs using heptadecanoic acid as the internal standard. FAs are expressed on a seed fresh weight basis. C, Protein content of developing seed as quantified by the Coomassie dye binding assay. [See online article for color version of this figure.]
Global Proteomics and Transcriptomics Quantified 1,025 Two-Dimensional Gel Spots and 22,746 Probes, Respectively, during Seed Filling
To generate global protein expression data, proteins prelabeled with Cy5 were used in combination with high-resolution two-dimensional gel electrophoresis (2-DE; Supplemental Fig. S1). To eliminate dye-effect biases, all analytical 2-DE was carried out exclusively with Cy5 from the same production lot. Use of the single CyDye yielded 10- to 20-fold increase in sensitivity versus Sypro Ruby or Coomassie Brilliant Blue while avoiding the problems associated with differences in labeling efficiency, molar absorptivity, and lot-to-lot variations. For profiling experiments involving multiple time points we have found that the single dye and lot approach is superior to sample multiplexing.
Isolated proteins from whole seeds were separated using broad (pH 3–10) and medium (pH 4–7) range immobilized pH gradient (IPG) strips in biological triplicate (Supplemental Fig. S1). Since the majority of Arabidopsis seed proteins have acidic pI values, the pH 4 to 7 range was used in the principal analytical gel, while pH 3 to 10 gels were analyzed only within the 3 to 4 and 7 to 10 ranges. Gels were imaged using the ImageMaster Platinum software to create protein expression profiles. Only those spots detected in biological triplicate and at least two developmental stages were further analyzed. A total of 1,025 spot groups satisfied these two criteria (Supplemental Table S2).
Proteins labeled with a fluorescent dye such as Cy5 present challenges for gel excision and subsequent protein identification, so preparative colloidal Coomassie Brilliant Blue-stained gels were produced for protein identification. Due to the differences in protein detection methods, only 696 spots were unequivocally matched to the 1,025 spot groups from the Cy5-labeled analytical gels. All 696 protein spots excised from gels were subjected to trypsin digestion and tandem mass spectrometry (MS/MS) for protein identification. A total of 523 protein spots were confidently identified by assigning a minimum of two unique peptides. These spots correspond to 346 nonredundant proteins (Supplemental Table S3). In some instances proteins were present as multiple spots, presumably the products of multigene families or posttranslational modifications. Seed storage proteins had the highest frequency of multiple spots. Proteins involved in primary metabolism and energy production comprise the largest groups of developing seed proteins; approximately 21% and 18%, respectively, of the total nonredundant proteins.
Global transcript profiling was performed in biological triplicate for each developmental stage using the Affymetrix ATH1 Genome Array (Fig. 2), and analyzed using GeneSpring software (version 7.3). Supplemental Figure S2 summarizes the results of gene expression trends plotted as normalized intensities (on a log scale) and the distribution of probe intensities across all developmental stages and biological replicates. Using this approach, expression patterns for 22,746 genes were obtained for the five sequential stages of seed filling.
Figure 2.
Experimental design for large-scale comparison of transcript and protein expression during Arabidopsis seed filling. Seeds were harvested at 5, 7, 9, 11, or 13 DAF. Total protein fractions were isolated and labeled with NHS-Cy5, then resolved by high-resolution 2-DE (employing both wide and medium range pH gradients), and analyzed to acquire protein expression profiles. Analyses were conducted in biological triplicate. Protein spots for which expression profile data were acquired were excised from the gel, trypsin digested, and analyzed by LC-MS/MS for identification. A total of 523 nonredundant proteins were conclusively identified based upon the minimum criterion of two unique, nonoverlapping peptides. For transcriptome analyses, mRNA was isolated, labeled, and hybridized to the Affymetrix ATH1 Genome Array (22,746 genes) in biological triplicate. Microarray slides were scanned and computationally analyzed to acquire mRNA expression profiles. The profile trends for each protein/transcript pair were compared using both correlation coefficient analysis and GLM.
Use of the PPMC r or the Kendall Rank Order Correlation τ for Pairwise Analysis Indicates a Significant Increase in Protein/Transcript Correlation across Time
The results of pairwise protein/transcript correlations are summarized in Table I. In total, 319 pairs were established, and expression was compared in at least one developmental stage. However, the total number of protein/transcript pairs at each developmental stage differed depending upon expression: 280 pairs were correlated at 5 DAF, 299 at 7 DAF, 305 at 9 DAF, 301 at 11 DAF, and 247 at 13 DAF. Employing correlation coefficient statistics at individual stages of seed filling, 10% and 8.6% of protein/transcript pairs correlated based on Pearson's r and the Kendall rank order correlation (KROC) coefficient τ at 5 DAF, respectively. At 13 DAF, as much as 19% and 18% of the pairs were positively correlated (P < 0.05) based on Pearson's r and Kendall's τ, respectively. These time-index changes indicate a significant increase in correlation across the developmental sequence (Table I). Contrary to these low correlation coefficients, when the calculations were performed for all 319 pairs over all developmental stages, a 44% correlation was observed (Table I). This inconsistency points out the need for a more robust assessment of protein/transcript relationships for the time-index experiment.
Correlation analysis of transcript-protein pairs from developing Arabidopsis seeds
Table I.
Correlation analysis of transcript-protein pairs from developing Arabidopsis seeds
A total of 319 protein/transcript pairs were correlated using Kendall's τ (K's T) and Pearson's correlation coefficients (P's) at least in one developmental stage. The table shows number of positively (Pos) and negatively (Neg) correlated pairs for all stages investigated (all days) and for each developmental stage individually. The table also shows percentage of significantly correlated (P < 0.05) pairs in relation to the total number of correlated pairs for each developmental stage.
Sign | P Value | All Days | 5 DAF | 7 DAF | 9 DAF | 11 DAF | 13 DAF | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
K's T | P's | K's T | P's | K's T | P's | K's T | P's | K's T | P's | K's T | P's | ||
Neg | <0.00016 | 32 | 27 | 24 | 24 | 24 | 24 | 25 | 25 | 26 | 26 | 49 | 49 |
Neg | <0.05 | 60 | 61 | 24 | 29 | 24 | 28 | 25 | 33 | 26 | 36 | 49 | 54 |
Neg | all | 111 | 121 | 131 | 132 | 150 | 161 | 150 | 158 | 172 | 167 | 118 | 120 |
Pos | all | 208 | 198 | 149 | 148 | 149 | 138 | 155 | 147 | 129 | 134 | 129 | 127 |
Pos | <0.05 | 135 | 139 | 24 | 29 | 13 | 15 | 22 | 34 | 17 | 26 | 45 | 47 |
Pos | <0.00016 | 77 | 84 | 24 | 24 | 13 | 13 | 22 | 22 | 17 | 17 | 46 | 46 |
Total correlated | 319 | 319 | 280 | 280 | 299 | 299 | 305 | 305 | 301 | 301 | 247 | 247 | |
Significantly correlated % | 42 | 44 | 8.6 | 10 | 4.4 | 5.0 | 7.2 | 11 | 5.6 | 8.6 | 18 | 19 |
Sign | P Value | All Days | 5 DAF | 7 DAF | 9 DAF | 11 DAF | 13 DAF | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
K's T | P's | K's T | P's | K's T | P's | K's T | P's | K's T | P's | K's T | P's | ||
Neg | <0.00016 | 32 | 27 | 24 | 24 | 24 | 24 | 25 | 25 | 26 | 26 | 49 | 49 |
Neg | <0.05 | 60 | 61 | 24 | 29 | 24 | 28 | 25 | 33 | 26 | 36 | 49 | 54 |
Neg | all | 111 | 121 | 131 | 132 | 150 | 161 | 150 | 158 | 172 | 167 | 118 | 120 |
Pos | all | 208 | 198 | 149 | 148 | 149 | 138 | 155 | 147 | 129 | 134 | 129 | 127 |
Pos | <0.05 | 135 | 139 | 24 | 29 | 13 | 15 | 22 | 34 | 17 | 26 | 45 | 47 |
Pos | <0.00016 | 77 | 84 | 24 | 24 | 13 | 13 | 22 | 22 | 17 | 17 | 46 | 46 |
Total correlated | 319 | 319 | 280 | 280 | 299 | 299 | 305 | 305 | 301 | 301 | 247 | 247 | |
Significantly correlated % | 42 | 44 | 8.6 | 10 | 4.4 | 5.0 | 7.2 | 11 | 5.6 | 8.6 | 18 | 19 |
Table I.
Correlation analysis of transcript-protein pairs from developing Arabidopsis seeds
A total of 319 protein/transcript pairs were correlated using Kendall's τ (K's T) and Pearson's correlation coefficients (P's) at least in one developmental stage. The table shows number of positively (Pos) and negatively (Neg) correlated pairs for all stages investigated (all days) and for each developmental stage individually. The table also shows percentage of significantly correlated (P < 0.05) pairs in relation to the total number of correlated pairs for each developmental stage.
Sign | P Value | All Days | 5 DAF | 7 DAF | 9 DAF | 11 DAF | 13 DAF | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
K's T | P's | K's T | P's | K's T | P's | K's T | P's | K's T | P's | K's T | P's | ||
Neg | <0.00016 | 32 | 27 | 24 | 24 | 24 | 24 | 25 | 25 | 26 | 26 | 49 | 49 |
Neg | <0.05 | 60 | 61 | 24 | 29 | 24 | 28 | 25 | 33 | 26 | 36 | 49 | 54 |
Neg | all | 111 | 121 | 131 | 132 | 150 | 161 | 150 | 158 | 172 | 167 | 118 | 120 |
Pos | all | 208 | 198 | 149 | 148 | 149 | 138 | 155 | 147 | 129 | 134 | 129 | 127 |
Pos | <0.05 | 135 | 139 | 24 | 29 | 13 | 15 | 22 | 34 | 17 | 26 | 45 | 47 |
Pos | <0.00016 | 77 | 84 | 24 | 24 | 13 | 13 | 22 | 22 | 17 | 17 | 46 | 46 |
Total correlated | 319 | 319 | 280 | 280 | 299 | 299 | 305 | 305 | 301 | 301 | 247 | 247 | |
Significantly correlated % | 42 | 44 | 8.6 | 10 | 4.4 | 5.0 | 7.2 | 11 | 5.6 | 8.6 | 18 | 19 |
Sign | P Value | All Days | 5 DAF | 7 DAF | 9 DAF | 11 DAF | 13 DAF | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
K's T | P's | K's T | P's | K's T | P's | K's T | P's | K's T | P's | K's T | P's | ||
Neg | <0.00016 | 32 | 27 | 24 | 24 | 24 | 24 | 25 | 25 | 26 | 26 | 49 | 49 |
Neg | <0.05 | 60 | 61 | 24 | 29 | 24 | 28 | 25 | 33 | 26 | 36 | 49 | 54 |
Neg | all | 111 | 121 | 131 | 132 | 150 | 161 | 150 | 158 | 172 | 167 | 118 | 120 |
Pos | all | 208 | 198 | 149 | 148 | 149 | 138 | 155 | 147 | 129 | 134 | 129 | 127 |
Pos | <0.05 | 135 | 139 | 24 | 29 | 13 | 15 | 22 | 34 | 17 | 26 | 45 | 47 |
Pos | <0.00016 | 77 | 84 | 24 | 24 | 13 | 13 | 22 | 22 | 17 | 17 | 46 | 46 |
Total correlated | 319 | 319 | 280 | 280 | 299 | 299 | 305 | 305 | 301 | 301 | 247 | 247 | |
Significantly correlated % | 42 | 44 | 8.6 | 10 | 4.4 | 5.0 | 7.2 | 11 | 5.6 | 8.6 | 18 | 19 |
Incorporation of Time as a Variable in the Regression Analysis Provides a More Robust Assessment of Protein/Transcript Correlations
Analysis of time-index data is difficult when only correlation coefficients are used, because this statistical approach evaluates the slope of the line and not the _y_-intercept or degree of line curvature. We therefore applied GLM to evaluate our datasets.
Overall, the concordance of expression profile regression parameters indicates that there is considerable similarity of response for protein/transcript pairs, and even with statistically small sample sizes some of the similarities are very strong (Table II). The distribution of concordance and discordance among the 319 protein/transcript pairs varied with the quadratic line properties including _y_-intercept, slope, and curvature (Fig. 3; Supplemental Table S4). Concordance with _y_-intercept, for example, indicates similar expression at the initial stage of seed filling, while discordance for slope or curvature suggests disparate time-index expression. The distribution of concordance for these three parameters does not appear random. Overall, 56% of the 319 protein/transcript pairs had concordant expression patterns.
Regression analysis of transcript-protein pairs from developing Arabidopsis seeds
Table II.
Regression analysis of transcript-protein pairs from developing Arabidopsis seeds
In total 319 transcript-protein pairs were subjected to regression analysis to evaluate their relationship during seed filling. The regression model has the following annotations: β 0 is the intercept for the protein curve, β 01 = β 0 + β 1 is the intercept for the microarray curve, β 2 is the slope for the protein curve, β 23 = β 2 + β 3 is the slope for the microarray curve, β 4 is the quadratic term for the protein curve, and β 45 = β 4 + β 5 is the quadratic term for the microarray curve.
Regression | β 0 and β 01 | β 2 and β 23 | β 4 and β 45 |
---|---|---|---|
Strong concordance | 30 | 22 | 18 |
Concordance | 164 | 169 | 160 |
Discordance | 155 | 150 | 159 |
Strong discordance | 47 | 5 | 3 |
Regression | β 0 and β 01 | β 2 and β 23 | β 4 and β 45 |
---|---|---|---|
Strong concordance | 30 | 22 | 18 |
Concordance | 164 | 169 | 160 |
Discordance | 155 | 150 | 159 |
Strong discordance | 47 | 5 | 3 |
Table II.
Regression analysis of transcript-protein pairs from developing Arabidopsis seeds
In total 319 transcript-protein pairs were subjected to regression analysis to evaluate their relationship during seed filling. The regression model has the following annotations: β 0 is the intercept for the protein curve, β 01 = β 0 + β 1 is the intercept for the microarray curve, β 2 is the slope for the protein curve, β 23 = β 2 + β 3 is the slope for the microarray curve, β 4 is the quadratic term for the protein curve, and β 45 = β 4 + β 5 is the quadratic term for the microarray curve.
Regression | β 0 and β 01 | β 2 and β 23 | β 4 and β 45 |
---|---|---|---|
Strong concordance | 30 | 22 | 18 |
Concordance | 164 | 169 | 160 |
Discordance | 155 | 150 | 159 |
Strong discordance | 47 | 5 | 3 |
Regression | β 0 and β 01 | β 2 and β 23 | β 4 and β 45 |
---|---|---|---|
Strong concordance | 30 | 22 | 18 |
Concordance | 164 | 169 | 160 |
Discordance | 155 | 150 | 159 |
Strong discordance | 47 | 5 | 3 |
Figure 3.
The GLM analysis of expression profiles for 319 transcript/protein pairs analyzed during seed filling in Arabidopsis. A, Three line parameters were evaluated by GLM including _y_-intercept, slope, and curvature to statistically compare transcript and protein expression. Temporal data for each transcript and protein pair were statistically evaluated for each of these parameters and determined to be either in concordance or discordance as denoted in the simplified graphical models. B, Distribution of concordant and discordant transcript/protein pairs based upon _y_-intercept parameter and distributed across protein functional classes. C, Functional distribution of concordant and discordant transcript/protein pairs based on slope parameter. D, Functional distribution of concordant and discordant transcript/protein pairs based on curvature parameter.
Mining the Concordance/Discordance Data
Recent progress has led to the development of efficient methods for database mining, ranging from methods of clustering, outlier analysis, frequent, sequential, and structured pattern analysis, and visualization of spatial and time-index datasets (Van den Bulcke et al., 2006; Antoine and Miernyk, 2007; Nicolas, 2009). The results from our GLM analysis of the concordance between protein and transcript expression profiles during Arabidopsis seed development suggest a similar utility (Fig. 3; Supplemental Table S4). Discordant protein/transcript pairs can be easily identified and targeted for further study, without any prior need to directly address the nature of this regulation.
DISCUSSION
An increasing body of literature addressing comparative analysis of global transcript and protein expression in eukaryotes has converged upon a general consensus that correlation between the two is poor (Gygi et al., 1999; Chen et al., 2002; Cox et al., 2007; Baerenfaller et al., 2008; Jayapal et al., 2008; Wu et al., 2008; Hornshøj et al., 2009; Tian et al., 2009). The underlying bases for the discordance in protein and mRNA abundance are manifold (Wu et al., 2008; Hendrickson et al., 2009; Piques et al., 2009), and difficulties in interpretation are exacerbated by the lack of adequate statistical tools to compensate for the inherent biases in data collection (Nie et al., 2007). The major aim of this study was to define the concordance of time-index patterns of protein/transcript expression during the early maturation stages of Arabidopsis seed development. We have employed GLM to evaluate the time variable so that it could be incorporated into the overall assessment of protein and transcript expression.
Selection of the appropriate statistical tools can have a crucial impact on data interpretation (Nie et al., 2007). In the case of comparative protein/transcript expression studies, the most commonly used nonparametric correlation analyses, the PPMC coefficient r (Rodgers and Nicewander, 1988), the SROC coefficient s r (Corder and Foreman, 2009), and the KROC coefficient τ (Degerman, 1982) yielded varying results. For instance, in yeast, the correlation analysis between protein and mRNA abundances gave an r value that is inadequate for prediction of protein expression levels from quantitative mRNA data (Gygi et al., 1999). The PPMC was also used in analysis of mRNA and protein levels in human prostate cells, with r values that varied from 0 to 0.63 (Pascal et al., 2008). In contrast to these two instances, expression of as many as 65% of the genes was judged to be significantly correlated with corresponding proteins in NCI-60 cancer cells using the PPMC (Shankavaram et al., 2007). Furthermore it was recently reported that calculation of the PPMC r indicated a positive correlation in a comparison of two porcine tissues analyzed using iTRAQ for protein and cDNA microarray/454-sequencing for transcript profiling (Hornshøj et al., 2009). Using the SROC, a significant number of genes with large discrepancies between protein and corresponding transcript abundances was determined in yeast (Griffin et al., 2002). The SROC has also been used to compare protein with corresponding transcript levels during the P. falciparum life cycle (Le Roch et al., 2004), but the calculated s r value supported concordance in only three out of seven instances. Our results suggest positive correlations of 42% and 44% through all stages of seed filling using the KROC and PPMC correlation analyses, respectively (Table I). However, dependence of pairwise correlation on the stage of seed development was also observed, ranging from 9% to 19% (Table I).
The use of GLM extends the multivariate regression model by allowing linear transformations of multiple dependent variables. This gives the GLM the important advantage that multivariate tests of significance can be employed when responses on multiple dependent variables are correlated (i.e. transcript, protein, developmental stage). This can also provide insight into which dimensions of the response variables are related to the predictor variables (Waldorp, 2009). A second advantage is the ability to analyze effects of repeated-measurement factors, which have traditionally been analyzed using ANOVA. Linear combinations of responses reflecting a repeated measure effect such as the difference of responses on a measure under differing conditions, such as time, can be constructed and tested for significance (Friston, 2008).
An important result to come from our GLM analyses addresses metabolic specialization. One aspect of Arabidopsis seed filling is the flow of carbon from Suc into FAs (Fig. 4; Hills, 2004; Baud et al., 2009; Andriotis et al., 2010). The protein/transcript pairs for pyrophosphate:Fru-6-P 1-phosphotransferase (At1g76550), cytosolic (At2g36460) and plastidial (At2g21330) Fru-bisP aldolase, cytosolic triose-P isomerase (At3g55440), cytosolic (At1g13440) and plastidial (At3g26650) glyceraldehyde-3-P dehydrogenase, plastidial phosphoglycerate kinase (At1g79550), cytosolic enolase (At2g36530), and plastidial pyruvate kinase (At5g52920) were all concordant during seed filling for at least two of three GLM parameters. At the same times, the majority of the 28 reactions of intermediary metabolism shown in Figure 4 were discordant. This reveals that there must be posttranscriptional regulation of core metabolism during seed development. A similar conclusion has been reached for bacteria and yeast (Griffin et al., 2002; Jayapal et al., 2008).
Figure 4.
Schematic view of carbohydrate metabolism during seed filling of Arabidopsis. Expression (heat) maps of individual protein (P) and transcript (T) expression based on proteomics and microarray experiments as relative value to 5 DAF are shown. Protein/transcript pairs are under one ATG number. Intermediates: UDP-G, UDP-Glc; G-1-P, Glc-1-P; G-6-P, Glc-6-P; F-6-P, Fru-6-P; 6PGLone, 6-phosphoglucono-d-lactone; 6PGLate, 6-phosphogluconate; Ru-5-P, ribulose-5-P; GAP, glyceraldehyde-3-P; F-1,6-bp, Fru-1,6-bisP; DHAP, dihydroxyacetone phosphate. Enzymes: 1, Suc synthase; 2, UDP-Glc pyrophosphorylase; 3, phosphoglucomutase; 4, Glc-6-P isomerase; 5, fructokinase; 6, phosphoglucomutase + Glc-6-P dehydrogenase + 6-phosphogluconate dehydrogenase + phosphoribulokinase; 7, phosphofructokinase; 8, Fru-1,6-bisP aldolase; 9, triose-P isomerase; 10, glyceraldehyde-3-P dehydrogenase; 11, phosphoglycerate kinase; 12, 2,3-bisphosphoglycerate-independent phosphoglycerate mutase; 13, enolase; 14, pyruvate kinase; 15, Glc-6-P isomerase + Glc-6-P dehydrogenase + 6-phosphogluconate dehydrogenase; 16, phosphoribulokinase; 17, Rubisco; 18, pyruvate dehydrogenase; 19, phospho_enol_pyruvate carboxylase; 20, malate dehydrogenase.
A small majority (179 of the 319) of protein/transcript pairs were concordant (Table II; Supplemental Table S3), and are thus unlikely to be candidates for posttranscriptional regulation of expression. These results are based upon steady-state analysis and might not detect all types of posttranslational regulation. From our survey, this leaves 140 protein/transcript pairs with discordant expression patterns suggesting posttranscriptional regulation. Included among these are genes/proteins involved in cellular structure (actin 8, At1g49240), signaling (ADP-ribosylation factor ATARF1, At1g23490), and RNA metabolism (Gly-rich RNA-binding proteins, At4g39260, At2g21660, RNA-binding proteins, At4g17520, At5g47210). One example of how our experimental strategy can be used for identifying targets for additional research is the intriguing case of plastidial pyruvate kinase. The expression trend of the two plastidial pyruvate kinase proteins (At3g22960 and At5g52920) was very similar, while transcript levels were discordant (At3g22960) and concordant (At5g52920) with protein expression for all three quadratic-line variables (Supplemental Table S3). It was previously reported that these genes encode an _α_-subunit (At3g22960) and a _β_-subunit (At5g52920) that stoichiometrically assemble into a α 4 β 4 heterooctomer (Andre et al., 2007). Apparently holomer assembly in some manner controls steady-state levels of the subunits. It will be interesting to similarly target other multisubunit complexes for comparative analysis.
In summary, we have employed GLM as an approach to determine patterns of protein/transcript concordance for a series of analyses where time was an integral component of experimental design. This approach proved to be more robust than methods used to study protein/transcript concordance based on pairwise correlations. The results of our analyses over five stages of Arabidopsis seed filling are consistent with an overall concordance of 56%. This value is substantially higher than those predicted using three different correlation coefficients, but is still too low to justify generalizations and/or assumptions regarding protein levels based solely on transcript profiling. The results indicate that GLM will be useful in data-mining applications aimed at identifying candidates suitable for studying posttranscriptional regulation of gene expression.
MATERIALS AND METHODS
Plant Material and Growth Conditions
Arabidopsis (Arabidopsis thaliana; Columbia ecotype 0) plants were grown in a controlled environment chamber (16-h-light/8-h-dark cycle, 23°C day/20°C night, 50% humidity, and light intensity of 8,000 LUX). Flowers were tagged upon opening and the developing seeds were collected at 5, 7, 9, 11, or 13 DAF, in the middle of a light cycle (between 11 am and 2 pm central U.S. time).
Seed Oil Content
FA content of developing Arabidopsis seeds at 5, 7, 9, 11, and 13 DAF was determined as described earlier (Hajduch et al., 2006) with minor modifications. Seeds were divided into three Teflon-lined glass screw cap vials per developmental stage (approximately 50 mg of seeds per tube) and dried at 80°C overnight. After dry weight determination, 1 mL of 14% BF3 in methanol was added to each tube along with 17:0 internal standard dissolved in toluene (0.5% of dry mass exactly). Total volume of toluene was brought to 150 μ_L and samples were incubated at 95°C for 90 min, with mixing every 10 min. After incubation, samples were cooled to 25°C. To each tube, 1 mL of water and 3 mL of hexane were added. Tubes were vortex mixed and centrifuged at 3,000_g for 5 min. The upper phase was removed and transferred to a conical glass tube. Samples were back extracted with additional 3 mL of hexane, dried under N2, and resuspended in 400 _μ_L of hexane before GC analysis. The GC analyses and quantitation were performed as described previously (Hajduch et al., 2006).
Protein Isolation and Cy5 Labeling
A total protein fraction was isolated from developing seeds and quantified using the Coomassie dye binding assay (Bio-Rad) with γ_-globulin as the standard. For Cy5 labeling, protein pellets were reconstituted in 30 mm Tris-HCl, pH 8.5, containing 7 m urea, 2 m thiourea, and 4% (w/v) CHAPS with vortex mixing for 30 min at 25°C followed by centrifugation for 15 min at 14,000_g to remove insoluble material. Then, 50 _μ_g of protein were adjusted to final volume of 10 _μ_L. One microliter of Cy5 (100 pmol) was added and the mixture was incubated on ice for 30 min in the dark. The labeling reaction was terminated by adding 1 _μ_L of 10 mm Lys followed by incubation on ice for an additional 10 min in the dark. For isoelectric focusing (IEF), 50 _μ_g of protein were mixed with equal volume of 2× sample buffer (8 m urea, 130 mm dithiothreitol, and 4% [w/v] CHAPS), incubated 10 min on ice, mixed with 2.25 _μ_L of IPG buffer (Amersham Biosciences), and adjusted to total volume of 450 _μ_L with 1× sample buffer.
For preparative colloidal Coomassie Brilliant Blue G-250-stained gels, protein pellets were resuspended in IEF resuspension media (8 m urea, 2 m thiourea, 2% [w/v] CHAPS, 2% [v/v] Triton X-100, 50 mm dithiothreitol) with vortex mixing as described above. For IEF, 1 mg of total protein was mixed with 2.25 _μ_L of appropriate IPG buffer in a total volume of 450 _μ_L of preparative IEF resuspension medium.
Image Acquisition and Analysis
Fluorescent gels were scanned using a FLA-5000 laser scanner (FUJI Medical). The Coomassie Brilliant Blue-stained gels were imaged by scanning densitometry (300 dpi, 16-bit grayscale). Digitized images were analyzed with ImageMaster 2-D platinum software (version 5.0, GE Healthcare). Protein abundance was expressed as a relative volume according to the normalization method provided by the software.
Protein Identification by MS
Proteins spots were excised from colloidal Coomassie Brilliant Blue-stained 2D gels and trypsin digested as described previously (Hajduch et al., 2005). The MS analyses were carried out with a linear ion trap tandem mass spectrometer (ProteomeX LTQ, Thermo-Fisher) using liquid chromatography and nanospray ionization exactly as described previously (Hajduch et al., 2006).
Database Searching with Spectral Data and Deposition in the Oilseed Proteome Database
Analysis of LC-MS/MS data was performed on a locally licensed copy of SEQUEST software (Eng et al., 1994). Searches were performed against the National Center for Biotechnology Information nonredundant database, Arabidopsis entries only (as of November 2005), and annotation for all protein matches were manually updated to current The Arabidopsis Information Resource annotation (as of December 11, 2009). Search parameters were set as follows: enzyme, trypsin; number of internal cleavage sites, 2; threshold, 500; minimum ion count, 35; peptide mass tolerance, 1.50; variable modifications, oxidation (M); static modification, carboxyamidomethylation (C). Matching peptides were filtered according to correlation scores (XCorr at least 1.5, 2.0, and 2.5 for +1, +2, and +3 charged peptides, respectively), peptide probability (maximum 0.05). For all protein assignments, a minimum of two unique, nonoverlapping peptides was required. Protein expression and summarized mass spectral assignment data from this investigation have been uploaded onto the Oilseed Proteomics server (http://oilseedproteomics.missouri.edu). Programming for the web database was performed as described previously (Hajduch et al., 2005). Data are viewable through 2-DE gels and a protein identification table. The spots on 2-DE gel and protein numbers in the protein table are hyperlinked to display expression profile and protein identification data.
Isolation of Total RNA
For total RNA isolation, a RNeasy plant mini kit (Qiagen) was used with minor modifications. In total 20 to 50 mg of harvested Arabidopsis seeds were homogenized with liquid N2 in 1.5 mL sterile polypropylene tubes using plastic pestles. Samples were resuspended in the kit-provided resuspension buffer (for 25 mg of seeds, 450 _μ_L of resuspension buffer), incubated 5 min at 57°C, cooled on ice, and transferred to provided lilac QIAshredder spin column (450 _μ_L of homogenate per column). The remaining procedure was performed as described according to the manual, with optional centrifugation after last wash with elution buffer. The concentration of total RNA was determined using a NanoDrop ND-1000 spectrophotometer (NanoDrop).
RNA Amplification, Target Biotin Labeling, and Hybridization to the Arabidopsis ATH1 Genechips
One microgram of seed total RNA was used to make biotin-labeled antisense RNA (aRNA) using the MessageAmp II-Biotin enhanced single round aRNA amplification kit (Ambion) according to manufacturer's procedures. Briefly, total RNA was reverse transcribed to first-strand cDNA with oligo(dT) primer bearing a 5_μ_-T7 promoter using ArrayScript reverse transcriptase. The first-strand cDNA then underwent second-strand synthesis and clean up to become the template for in vitro transcription. Biotin-labeled aRNA was synthesized using T7 RNA transcriptase with biotin-NTP mix. After purification, aRNA was fragmented in 1× fragmentation buffer at 94°C for 35 min. Ten micrograms of fragmented aRNA in 200 _μ_L of hybridization solution was hybridized to the Arabidopsis ATH1 genechip (Affymetrix) at 45°C for 20 h. After hybridization, chips were washed and stained with R-phycoerythrin-streptavidin on Affymetrix fluidics station 450 using fluidics protocol EukGE-WS2v4. The image data were acquired using an Affymetrix Genechip scanner 3000.
Microarray Data Analysis
Microarray data analysis for the three replicates for each developmental stage was performed using GeneSpring GX 7.3 software (Silicon Genetics). The array intensities were normalized using data transformation to set measurements less than 0.01 to 0.01 per chip normalization to 50th percentile, and per gene normalization to median (Supplemental Fig. S1). The normalized data were transformed to natural log values to calculate the expression value. The scatter plots of replicate arrays performed after normalization indicated the data were highly reproducible. After normalization a Student's t test with a P value cutoff of 0.05, and the Benjamin and Hochberg false discovery rate was applied to filter out genes having significantly differentiated expression patterns.
Development of Cognate Gene and Protein Models for Statistical Analysis
Initially, cognate transcript and protein pairs were determined by verifying at least one protein was detected for each 2-DE spot groups. Then expression data for 2-DE spot groups that were assigned to the same gene were summed for comparison to transcript expression. To correlate proteomic and transcriptomic datasets, both the protein and transcript expression values were tested to find a minimum variance transform with the Box-Cox procedure under linear modeling assumptions (Box and Cox, 1964). The protein and microarray data were transformed y = log2 (x) where x is the observed volume or optical intensity, and the transformed values were used for the rest for the analysis. Each source of data was then statistically modeled to account for known but experimentally irrelevant factors, or sources of variation, leaving the experimentally relevant factor day within spot or probe and experimental error in the residuals.
To put the data into the same relative numeric scale, known sources of variation in the data collection process were statistically modeled and if the sources of variation were not of experimental interest their contributions to experimental variation were removed. In the case of protein data, the factors of experimental interest were spot volumes sampled at each developmental stage. These factors, together with a temporal term constitute the factor level variability. A mixed linear statistical model with the intercept held as a random effect was fit to the data without the temporal factor. The observed values minus the predicted values were the residuals and were centered on a mean value of 0. These residual values were divided by the sd of the residuals to get a normalized and standardized expected variable of 1 to estimate the spot volume for the ATG Probe for this time. Across all ATG numbers the transformed and scaled values were used to model the spot measurement values.
Because of the nature of the 2-DE analyses, there are occasionally missing values in the proteomics data. However there were sufficient biological repetitions and temporal samplings to allow use of the expectation-maximization algorithm, to estimate the distribution of the missing values and produce five values for each time point (Dempster et al., 1977). This increased the dataset size by a factor of five. The augmented dataset was then used throughout the rest of the analyses. For microarrays it was expected that probe intensity across seed development was of experimental interest, and the microarray data were normalized and standardized in the same way as the proteomics data.
The normalized proteomic and microarray datasets were then merged on the field Probe_ID-ATG ID. There are 319 protein/transcript pairs through five developmental time points, and three biological replicates. The normalized, standardized, and merged analytic datasets contain 23,025 data records and comprise the dataset used in all subsequent analyses.
Pairwise Nonparametric Analyses
The PPMC and KROC make different assumptions about the underlying distribution of data. The Pearson r measures the strength of linear association between the random variables x and y. It is scale independent and assumes the random variables have a normal distribution. The Kendall τ is a measure of the concordance for all pairs of observed values (x j,y j) and (x i,y i) where a pair is concordant if x i > x j and y i > y j or x i < x j and y i < y j, and discordant otherwise. Associated with each correlation coefficient is a measure of the probability of making a type I error; that is, the probability of being in error if you reject the null hypothesis that the correlation is zero. We can count the number of correlations that are positive or negative either across days or within days. We can also restrict these counts to those correlations with significant P values < 0.05. However, this ignores the multiple hypotheses testing condition, which says that if we want to have an overall error rate of α F, we have to apply a more stringent selection criterion, α 0, for the test. Two possible methods for finding this cutoff value are Sidak's method where α 0 = 1 − (1 − α F)1/G and G is the number of tests, 319 in this case, and Bonferroni's method where α 0 = α F /G. Applying Sidak's method we get α 0 = 0001601 and from Bonferroni we get α 0 = 0001567. Thus, there will be an approximate family wise error rate of α 0 = 0.05 if we set the cutoff value at α 0 = 0.00016.
GLM
Regression analysis with time as integral part of the model was used to determine importance of time factor in determining protein/transcript correlations. The regression model that was fit to the protein and microarray data is a quadratic model with log spot or log probe intensity as the dependent variable and time and time squared as the independent variables. Both dependent variables were modeled with the same quadratic regression model, y = β 0 + β 1 I + β 2 D + β 3 DI + β 4 D 2 + β 5 D 2 I + ε, where y is the dependent variable, D is the independent variable day, and I is an indicator variable (I = 0 if DIGE otherwise I = 1), β is the regression parameter, and ε is the error term. The intercept parameter for protein only is β 0, the intercept parameter for microarray is (β 0 + β 1) and if the parameter β 1 is not statistically significantly different from 0 then there is no statistical difference between the intercepts for DIGE or microarray. Similar interpretations can be given for the linear and quadratic terms in the regression model. We will use the following notation: β 0, β 2, and β 4, are the intercept, linear, and quadratic terms for the protein regression model. The microarray regression model has parameters β 01 = β 0 + β 1, β 23 = β 2 + β 3, and β 45 = β 4 + β 5. The difference terms are β 1, β 3, and β 5. If a difference term is not statistically different from 0, then that parameter in the protein and microarray models is statistically equivalent. To assess how well the model fits the data we can use the coefficient of multiple determination R 2. Using standard linear modeling notation we can define . For each spot probe pair we fit a model across both types of data, all days, and all three replicate observations. This assumes that the residuals from each model are normally distributed, r: N (0, σ 2). An examination of the residual plots shows that the model fits the data well in most instances.
Since protein intensity measurements are scaled differently than microarray intensity measurements, it would not be expected that the regression equations would be the same. However, it would be expected that the linear parameter slope, β 2 and β 23, and the quadratic parameter direction of change over time, β 4 and β 45, would be good metrics for similarity or dissimilarity of biological activity. It is possible to define pairs of corresponding parameter values, β 0 and β 01 or β 2 and β 23 or β 4 and β 45, as having concordance if the two parameters are either significantly positive or negative for the same spot probe. Similarly, discordance would be if one parameter is significantly positive and the other is significantly negative. For concordance and discordance, there is no requirement for the parameters to be significantly different from 0. The frequency and degree of concordance/discordance measurements are presented in Table II, indicating the similarity of response for protein/transcript pairs.
Supplemental Data
The following materials are available in the online version of this article.
- Supplemental Figure S1. 2-DE analysis of proteins (50 _μ_g) isolated from immature seeds of 5, 7, 9, 11, or 13 DAF and labeled with _n_-hydroxysuccinimide-activated Cy5.
- Supplemental Figure S2. Microarray analysis of RNA isolated from developing Arabidopsis seeds at 5, 7, 9, 11, or 13 DAF.
- Supplemental Table S1. FA composition of developing Arabidopsis seeds.
- Supplemental Table S2. Expression profile data for 1,025 protein spot groups from two-dimensional gels.
- Supplemental Table S3. Master table of MS/MS protein identification and GLM data.
- Supplemental Table S4. Summary and distribution of GLM concurrence sorted according to protein functional classes.
ACKNOWLEDGMENTS
We thank Dr. Huachun Wang for Arabidopsis seed imaging and the two anonymous reviewers for helpful comments.
LITERATURE CITED
(
2008
)
In-depth investigation of the soybean seed-filling proteome and comparison with a parallel study of rapeseed
.
Plant Physiol
148
:
504
–
518
(
2007
)
A heteromeric plastidic pyruvate kinase complex involved in seed oil biosynthesis in Arabidopsis
.
Plant Cell
19
:
2006
–
2022
(
2010
)
Plastidial glycolysis in developing Arabidopsis embryos
.
New Phytol
185
:
649
–
662
(
2007
)
Shape-to-string mapping: a novel approach to clustering time-index biomics data
.
Online J Bioinformatics
8
:
139
–
153
(
2008
)
Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics
.
Science
320
:
938
–
941
(
2009
)
Role of WRINKLED1 in the transcriptional regulation of glycolytic and fatty acid biosynthetic genes in Arabidopsis
.
Plant J
60
:
933
–
947
(
1964
)
An analysis of transformations
.
J R Stat Soc Series B Stat Methodol
26
:
211
–
252
et al. (
2002
)
Discordant protein and mRNA expression in lung adenocarcinomas
.
Mol Cell Proteomics
4
:
304
–
313
(
2009
)
Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach
.
Wiley
,
Hoboken, NJ
et al. (
2007
)
Integrated proteomic and transcriptomic profiling of mouse lung development and Nmyc target genes
.
Mol Syst Biol
3
:
1
–
15
(
1982
)
Ordered binary trees constructed through an application of Kendall's tau
.
Psychometrika
47
:
523
–
527
(
1977
)
Maximum likelihood from incomplete data via the EM algorithm
.
J R Stat Soc Series B Stat Methodol
39
:
1
–
38
(
1994
)
An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database
.
J Am Soc Mass Spectrom
5
:
976
–
989
(
2008
)
Hierarchical models in the brain
.
PLoS Comput Biol
4
:
e1000211
(
1994
)
Plant embryogenesis: zygote to seed
.
Science
266
:
605
–
614
(
2002
)
Complementary profiling of gene expression at the transcriptome and proteome levels in Saccharomyces cerevisiae
.
Mol Cell Proteomics
1
:
323
–
333
(
1999
)
Correlation between protein and mRNA abundance in yeast
.
Mol Cell Biol
19
:
1720
–
1730
(
2006
)
Proteomic analysis of seed filling in Brassica napus: developmental characterization of metabolic isozymes using high-resolution two-dimensional gel electrophoresis
.
Plant Physiol
141
:
32
–
46
(
2005
)
A systematic proteomic study of seed filling in soybean: establishment of high-resolution two-dimensional reference maps, expression profiles, and an interactive proteome database
.
Plant Physiol
137
:
1397
–
1419
(
2009
)
Concordant regulation of translation and mRNA abundance for hundreds of targets of a human microRNA
.
PLoS Biol
7
:
e1000238
(
2004
)
Control of storage-product synthesis in seeds
.
Curr Opin Plant Biol
7
:
302
–
308
(
2009
)
Transcriptomic and proteomic profiling of two porcine tissues using high-throughput technologies
.
BMC Genomics
10
:
30
(
2009
)
Know your limits: assumptions, constraints and interpretations in systems biology
.
Biochim Biophys Acta
1794
:
1280
–
1287
(
2008
)
Uncovering genes with divergent mRNA-protein dynamics in Streptomyces coelicolor
.
PLoS One
3
:
e2097
(
2007
)
Using genomics to study legume seed development
.
Plant Physiol
144
:
562
–
574
et al. (
2004
)
Global analysis of transcript and protein levels across the Plasmodium falciparum life cycle
.
Genome Res
14
:
2308
–
2318
(
2009
)
Data mining, a tool for systems biology or a systems biology tool
.
J Comput Sci Syst Biol
2
:
216
–
218
(
2007
)
Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications
.
Crit Rev Biotechnol
27
:
63
–
75
(
2002
)
Genome-wide study of gene copy numbers, transcripts, and protein levels in pairs of non-invasive and invasive human transitional cell carcinomas
.
Mol Cell Proteomics
1
:
37
–
45
(
2008
)
Correlation of mRNA and protein levels: cell type-specific gene expression of cluster designation antigens in the prostate
.
BMC Genomics
9
:
246
(
2009
)
Ribosome and transcript copy numbers, polysome occupancy and enzyme dynamics in Arabidopsis
.
Mol Syst Biol
5
:
314
et al. (
2008
)
A joint transcriptomic, proteomic and metabolic analysis of maize endosperm development and starch filling
.
Plant Biotechnol J
6
:
855
–
869
(
1988
)
Thirteen ways to look at the correlation coefficient
.
Am Stat
42
:
59
–
66
(
2002
)
Contrapuntal networks of gene expression during Arabidopsis seed filling
.
Plant Cell
14
:
1191
–
1206
(
2004
)
Discordance of UPR signaling by ATF6 and Ire1p-XBP1 with levels of target transcripts
.
Biochem Biophys Res Commun
317
:
390
–
396
et al. (
2007
)
Transcript and protein expression profiles of the NCI-60 cancer cell panel: an integromic microarray study
.
Mol Cancer Ther
6
:
820
–
832
(
2008
)
The beginning of the end for microarrays?
Nat Methods
5
:
585
–
587
(
2009
)
Transcript and proteomic analysis of developing white lupin (Lupinus albus L.) roots
.
BMC Plant Biol
9
:
1
(
2006
)
Inferring transcriptional networks by mining omics data
.
Curr Bioinform
1
:
301
–
313
(
2009
)
Robust and unbiased variance of GLM coefficients for misspecified autocorrelation and hemodynamic response models in fMRI
.
Int J Biomed Imaging
2009
:
723912
(
2008
)
Integrative analyses of posttranscriptional regulation in the yeast Saccharomyces cerevisiae using transcriptomic and proteomic data
.
Curr Microbiol
57
:
18
–
22
Author notes
1
This work was supported by the National Science Foundation-Plant Genome Research Program Young Investigator Award (grant no. DBI–0332418).
2
Present address: Institute of Plant Genetics and Biotechnology, Slovak Academy of Sciences, 950 07 Nitra, Slovak Republic.
3
Present address: Research Laboratory for Biotechnology and Biochemistry, GPO Box 8207, Kathmandu, Nepal.
The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Jay J. Thelen (thelenj@missouri.edu).
[C]
Some figures in this article are displayed in color online but in black and white in the print edition.
[W]
The online version of this article contains Web-only data.
[OA]
Open Access articles can be viewed online without a subscription.
© 2010 American Society of Plant Biologists
© The Author(s) 2010. Published by Oxford University Press on behalf of American Society of Plant Biologists. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Supplementary data
Citations
Views
Altmetric
Metrics
Total Views 1,253
907 Pageviews
346 PDF Downloads
Since 1/1/2021
Month: | Total Views: |
---|---|
January 2021 | 1 |
February 2021 | 4 |
March 2021 | 19 |
April 2021 | 11 |
May 2021 | 23 |
June 2021 | 37 |
July 2021 | 44 |
August 2021 | 22 |
September 2021 | 29 |
October 2021 | 38 |
November 2021 | 30 |
December 2021 | 22 |
January 2022 | 22 |
February 2022 | 33 |
March 2022 | 40 |
April 2022 | 33 |
May 2022 | 30 |
June 2022 | 36 |
July 2022 | 50 |
August 2022 | 34 |
September 2022 | 20 |
October 2022 | 36 |
November 2022 | 29 |
December 2022 | 25 |
January 2023 | 32 |
February 2023 | 21 |
March 2023 | 38 |
April 2023 | 21 |
May 2023 | 16 |
June 2023 | 29 |
July 2023 | 38 |
August 2023 | 27 |
September 2023 | 25 |
October 2023 | 27 |
November 2023 | 9 |
December 2023 | 25 |
January 2024 | 17 |
February 2024 | 22 |
March 2024 | 11 |
April 2024 | 32 |
May 2024 | 44 |
June 2024 | 22 |
July 2024 | 38 |
August 2024 | 26 |
September 2024 | 21 |
October 2024 | 17 |
November 2024 | 27 |
Citations
105 Web of Science
×
Email alerts
Citing articles via
More from Oxford Academic