Proteomics of the Chloroplast: Systematic Identification and Targeting Analysis of Lumenal and Peripheral Thylakoid Proteins (original) (raw)

Plant Cell. 2000 Mar; 12(3): 319–342.

Systematic Identification and Targeting Analysis of Lumenal and Peripheral Thylakoid Proteins

Jean-Benoît Peltier

aDepartment of Biochemistry, Arrhenius Laboratories, Stockholm University, S-10691 Stockholm, Sweden

Giulia Friso

bDepartment of Cellular and Molecular Pharmacology, AstraZeneca Novum, S-14157 Huddinge, Sweden

Dário Eluan Kalume

cDepartment of Molecular Biology, Odense University, DK-5230 Odense M, Denmark

Peter Roepstorff

cDepartment of Molecular Biology, Odense University, DK-5230 Odense M, Denmark

Frederik Nilsson

dDepartment of Bioanalytical Chemistry, AstraZeneca R&D Mölndal, S-43183 Mölndal, Sweden

Iwona Adamska

aDepartment of Biochemistry, Arrhenius Laboratories, Stockholm University, S-10691 Stockholm, Sweden

Klaas J. van Wijk

aDepartment of Biochemistry, Arrhenius Laboratories, Stockholm University, S-10691 Stockholm, Sweden

aDepartment of Biochemistry, Arrhenius Laboratories, Stockholm University, S-10691 Stockholm, Sweden

bDepartment of Cellular and Molecular Pharmacology, AstraZeneca Novum, S-14157 Huddinge, Sweden

cDepartment of Molecular Biology, Odense University, DK-5230 Odense M, Denmark

dDepartment of Bioanalytical Chemistry, AstraZeneca R&D Mölndal, S-43183 Mölndal, Sweden

1To whom correspondence should be addressed. E-mail es.us.imekoib@saalk; fax 46-8-153679

Received 1999 Oct 15; Accepted 1999 Dec 23.

Copyright © 2000, American Society of Plant Physiologists

Abstract

The soluble and peripheral proteins in the thylakoids of pea were systematically analyzed by using two-dimensional electrophoresis, mass spectrometry, and N-terminal Edman sequencing, followed by database searching. After correcting to eliminate possible isoforms and post-translational modifications, we estimated that there are at least 200 to 230 different lumenal and peripheral proteins. Sixty-one proteins were identified; for 33 of these proteins, a clear function or functional domain could be identified, whereas for 10 proteins, no function could be assigned. For 18 proteins, no expressed sequence tag or full-length gene could be identified in the databases, despite experimental determination of a significant amount of amino acid sequence. Nine previously unidentified proteins with lumenal transit peptides are presented along with their full-length genes; seven of these proteins possess the twin arginine motif that is characteristic for substrates of the TAT pathway. Logoplots were used to provide a detailed analysis of the lumenal targeting signals, and all nuclear-encoded proteins identified on the two-dimensional gels were used to test predictions for chloroplast localization and transit peptides made by the software programs ChloroP, PSORT, and SignalP. A combination of these three programs was found to provide a useful tool for evaluating chloroplast localization and transit peptides and also could reveal possible alternative processing sites and dual targeting. The potential of proteomics for plant biology and homology-based searching with mass spectrometry data is discussed.

INTRODUCTION

Chloroplasts in green algae and higher plants contain photosynthetic thylakoid membranes. Four multisubunit protein complexes, photosystem I (PSI), PSII, the ATP-synthase complex, and the cytochrome b_6_f complex, which together comprise 75 to 100 proteins, perform the photosynthetic reactions (see, e.g., Ort and Yocum, 1996). It can be postulated that the thylakoid membranes contain a large number of other proteins that are involved in the biogenesis and regulation of the four multisubunit complexes (Wollman et al., 1999). These additional proteins would be involved in processes such as biosynthesis and ligation of cofactors, insertion of proteins into membranes, and folding and degradation of proteins. Several such proteins have been identified in the thylakoid membrane. They include the proteases DegP and FtsH (reviewed in Adam, 1996); protein translocation components, such as SecY, SecE, SecA, Alb3, and Hcf106 (reviewed in Settles and Martienssen, 1998; Dalbey and Robinson, 1999; Keegstra and Cline, 1999); the PSII assembly factor Hcf136 (Meurer et al., 1998); and a lumenal isomerase TLP40 (Fulgosi et al., 1998).

To regulate biogenesis and several thylakoid functions, kinases (Snyders and Kohorn, 1999), phosphatases (Vener et al., 1998), and possibly other signal transducers are present in the thylakoid membrane. Several proteins without obvious function have been identified in the intrathylakoid space, the thylakoid lumen (Kieselbach et al., 1998). Based on preliminary information and postulated functions, it can be expected that at least 100 proteins involved in such processes probably exist and that many of these proteins are present in low abundance in chloroplasts.

Although best known for their role in photosynthesis, chloroplasts also synthesize many essential compounds, including plant hormones (Marin et al., 1996; Lange, 1998), fatty acids and lipids (Miquel and Browse, 1992; Essigmann et al., 1998), amino acids (Ho et al., 1999), vitamins (B1, K1, and E; Belanger et al., 1995), purine and pyrimidine nucleotides (Doremus, 1986; Smith et al., 1998), and secondary metabolites such as alkaloids and isoprenoids (Keller et al., 1998). In addition, chloroplasts also are required for nitrate and sulfur assimilation (Heldt, 1997). Several of the enzymes in these biosynthetic pathways have been identified by analyzing mutants of Arabidopsis and other plant species. However, more proteins and possible new pathways remain to be discovered, and it is not unlikely that several of these components are located in or at the surface of the thylakoid membrane.

The 120- to 160-kb circular chloroplast genome encodes only ∼120 proteins and RNA molecules that are involved in chloroplast transcription and translation or encode subunits of the four complexes involved in photosynthesis or the NADH dehydrogenase complex (Sugita and Sugiura, 1996). It is estimated, however, that the chloroplast contains between 2000 and 5000 different proteins; thus, the majority of chloroplast proteins are encoded by the nuclear genome. Those nuclear-encoded proteins are synthesized as precursors on cytosolic ribosomes and subsequently are targeted to the chloroplast via an N-terminal transit peptide, which is proteolytically removed after import into the chloroplast (reviewed in Keegstra and Cline, 1999). Once inside the chloroplast, at least four pathways operate to target the proteins to the thylakoid membrane or into the thylakoid lumen (Dalbey and Robinson, 1999; Keegstra and Cline, 1999). The presequences of these nuclear-encoded chloroplast proteins share common features, which can be used to predict localization with moderate confidence (Emanuelsson et al., 1999; Nakai and Horton, 1999). However, the degeneracy of these targeting sequences precludes a systematic polymerase chain reaction–based screening for all chloroplast-localized proteins.

The improvement of two-dimensional electrophoresis by the development of immobilized pH gradients (IPGs; Görg et al., 1988) together with improved solubilization techniques (Rabilloud et al., 1997; Molloy et al., 1998) now permits the reproducible separation of up to 2000 proteins on a single two-dimensional electrophoresis gel. Such gel-separated proteins can be identified rapidly by mass spectrometry (MS), and if genomic information is also available, such analyses permit the systematic identification of the protein complement of a genome, the proteome (Shevchenko et al., 1996; Dainese et al., 1997; Roepstorff, 1997; Yates, 1998). In addition, MS is a powerful tool for analysis of isoforms, secondary modifications of proteins (such as glycosylation, phosphorylation, or isoprenylation), and proteolysis requiring only low amounts (picomoles to attomoles) of proteins (Burlingame et al., 1998; Kuster and Mann, 1998; McLafferty et al., 1999; Wilkins et al., 1999). Such systematic analysis of protein populations is summarized by the term proteomics. Thus, proteomics bridges the gap between genomic sequence information and the actual protein population in a specific tissue, cell, or cellular compartment.

To identify novel components involved in thylakoid biogenesis, we have started to identify systematically the thylakoid proteins by two-dimensional electrophoresis, matrix-assisted laser desorption/ionization–time of flight (MALDI-TOF) MS, electrospray ionization tandem MS (ESI-MS/MS), and N-terminal Edman sequencing. MS has been used previously to analyze purified PSII complexes (Whitelegge et al., 1998; Zheleva et al., 1998), but no systematic analysis of chloroplast proteins has been conducted. In this study, we present detailed reproducible two-dimensional electrophoresis maps of the lumenal and peripheral proteins of the thylakoids from pea. Functional domain analysis was performed for the newly discovered proteins or translated open reading frames by different programs available on the Internet. Localization prediction and presequences were analyzed by using the (noncommercial) software programs PSORT (Nakai and Horton, 1999), ChloroP (Emanuelsson et al., 1999), and SignalP (Nielsen et al., 1997). After correction for spots resulting from post-translational modifications and proteolysis, we estimated that there are in total at least 200 proteins in the lumenal space and in the periphery of the thylakoid membrane.

RESULTS

Isolation of Protein Populations and Reproducibility of Preparations and Two-Dimensional Electrophoresis Gels

Two-dimensional electrophoresis maps were constructed for lumenal and peripheral proteins of the thylakoid membrane. To avoid cross-contamination with nonchloroplast proteins, we made all preparations from intact pea chloroplasts that had been purified on linear Percoll gradients. Thylakoids then were liberated from the intact chloroplasts by osmotic shock and carefully washed to remove stromal proteins (Figures 1A and 1B). The lumenal proteins subsequently were liberated from the thylakoids by sonication, and the peripheral proteins were extracted from the thylakoid membranes by incubation under high-salt conditions (Figure 1A). The enrichment for lumenal plastocyanin, two proteins of the oxygen evolving complex (OEC33 and OEC23), and the peripheral coupling factor protein CF1α, as well as the partitioning of the stromal ribulose bisphosphate carboxylase small subunit (RbcS) and the abundant integral membrane protein light-harvesting complex IIb (LhcIIb), was verified by protein gel blot analysis (Figure 1B). Clearly, LhcIIb and RbcS partitioned away from the lumen (Figure 1B, lane 6) and peripheral proteins (Figure 1B, lane 8), because no immunoresponse was found in either of those fractions (even after overexposure of the blots), indicating that contamination of the lumenal and peripheral fractions by these proteins was <1%.

An external file that holds a picture, illustration, etc. Object name is 99-0380f1.jpg

Protein Gel Blot Analysis and Scheme of the Protein Purification Process.

(A) To purify thylakoid protein fractions enriched in lumenal and peripheral proteins, intact and purified chloroplasts (1) were lysed and separated into crude thylakoids (2) and soluble stromal proteins and envelope proteins (3). Subsequently, the thylakoids were washed extensively to remove stromal proteins and envelopes. These washed thylakoids (4) then were sonicated to liberate soluble lumenal proteins (6), and the sonicated thylakoid membranes (5) were collected by centifugation. The sonicated thylakoid membranes were incubated in high salt to liberate the peripheral thylakoid proteins (8), and the remaining membranes, containing only integral membrane proteins (7), were removed by centrifugation. Fractions 6 and 8 were used for the proteomics analysis.

(B) The partitioning of a set of stromal, peripheral, integral, and lumenal proteins during the purification of the lumenal and peripheral protein-enriched fractions was followed by protein gel blotting. All samples were loaded on an equal volume basis. Polyclonal antisera were used against the ribulose bisphosphate carboxylase small subunit (RbcS), which is one of the most abundant stromal proteins, peripheral protein CF1α on the stromal side, the integral membrane protein LhcIIb, two extrinsic subunits of the oxygen-evolving complex (OEC33 and OEC23) of PSII on the lumenal side of the membrane, and the soluble lumenal protein plastocyanin. The lane numbers in the protein gel blots correspond to the numbers in the purification scheme shown in (A).

The peripheral protein CF1α was found both in the lumenal as well as in the peripheral protein fraction. CF1α is a peripheral protein located at the stromal side of the thylakoid membrane; thus, sonication released a substantial fraction of this protein from the membrane into the lumenal fraction (see Discussion). The lumenal protein plastocyanin partitioned completely to the lumen, as detected by protein gel blot analysis, indicating that any plastocyanin in the peripheral fractions represents <1% of total plastocyanin content. The peripheral proteins OEC33 and OEC23 at the lumenal side were found in both the lumenal fraction and the peripheral fraction; however, as expected from the structure of the OEC complex, OEC33 partitioned more to the peripheral protein population, whereas OEC23 partitioned more to the lumen. The sonication and extraction completely removed the peripheral and lumenal marker proteins from the remaining integral membrane protein fraction (Figure 1B, lane 7).

To improve the resolution of the two-dimensional electrophoresis maps, we made separate maps for the low pH range (4.0 to 7.0) and the high pH range (6.0 to 11.0). Subfractionation of the thylakoid proteins helped to improve the resolution of these maps because the different physical-chemical properties of soluble versus peripheral membrane proteins necessitate the use of different detergent mixtures for optimal separation and migration in the first dimension (Rabilloud et al., 1997; Herbert, 1999). The subfractionation and the use of separate IPGs for low and high pH ranges also tend to increase the probability that proteins of low abundance will be identified, especially if the protein population is dominated by a number of very abundant proteins, as is the case for the thylakoid membrane.

The two-dimensional electrophoresis maps of the lumenal and peripheral proteins separated in the first dimension between pH 4.0 to 7.0 (acidic map) and pH 7.0 to 11.0 (basic map) are shown in Figures 2A to 2D. For the basic maps, only the region from pH 7.0 to 11.0 is shown, to avoid overlap with the acidic maps, although the IPG strips were from 6.0 to 11.0. Between 360 and 400 protein spots were detected by silver staining (detection limit below 1 ng) in each acidic map, and ∼50 protein spots were detected in each basic map. Computer-aided image analysis indicated a 39% overlap between the two acidic two-dimensional electrophoresis maps and a 75% overlap between the two basic two-dimensional electrophoresis maps. The maps were made in triplicate from independent chloroplast preparations by using different batches of pea leaves, and we observed an excellent reproducibility (data not shown).

An external file that holds a picture, illustration, etc. Object name is 99-0380f2.jpg

Silver-Stained Two-Dimensional Electrophoresis Maps of Lumenal and Peripheral Proteins.

Proteins were separated by two-dimensional gel electrophoresis with denaturing isoelectric focusing in the first dimension and SDS-PAGE in the second dimension. Gels were calibrated for molecular mass (in kilodaltons) and pI (in pH units) by internal (pH and mass) and external (mass) standards, which are indicated. Numbers indicate protein spots listed in Tables 1 to ​4. For a selected number of spots, the identity (in addition to the number) has been listed on the two-dimensional electrophoresis map. Spots on the acidic maps (pH 4.0 to 7.0) for the peripheral and the lumenal proteins are, respectively, numbered from 1 to 99 and 100 to 199. Spots on the basic maps (pH 7.0 to 11.0) of the peripheral and lumenal proteins are numbered from 200 to 249 and 250 to 300, respectively. The same numbers are used on the lumenal and peripheral maps if spots could be matched by image analysis and confirmed by mass fingerprints or sequence tags. If proteins were identified on both maps, the number was chosen for the map in which the spot was most abundant.

(A) and (C) Peripheral proteins were separated in the first dimension on IPGs between pH 4.0 and 7.0 (A) and between pH 7.0 and 11.0 (C).

(B) and (D) Lumenal proteins were separated in the first dimension on IPGs between 4.0 and 7.0 (B) and between 7.0 and 11.0 (D).

Strategies for Identification of Peripheral and Lumenal Proteins

The principles of our proteomics approach are shown schematically in Figure 3. After two-dimensional electrophoresis, protein spots were selected and analyzed by MALDI-TOF. To denote a protein as unambiguously identified, we set the following criteria (Parker et al., 1998): coverage of the mature protein (i.e., excluding cleavable presequences) by the matching peptides must reach a minimum of 15%, and at least four independent (i.e., with no sequence overlap) peptides should match, within a stringent 15 ppm maximum deviation of mass accuracy (thus, maximum 0.015% difference between the experimental and theoretical mass of the MALDI peptides). If a protein could not be identified unambiguously by using MALDI-TOF, peptide sequence tags were obtained by ESI-MS/MS or Edman sequencing and were used for protein identification. Three to five precursor ions from each sample were selected for ESI-MS/MS analysis. The peptide masses and obtained sequence tags were used to search the public databases with the freely available software program MS-Tag, developed at the University of California, San Francisco MS Facility (http://prospector.ucsf.edu) and FASTA. To obtain sequence tags by Edman sequencing, we stained gels with Coomassie Brilliant Blue R 250 before blotting to increase the staining sensitivity and to facilitate matching with other gels. If information about the length of cleavable transit peptides was obtained, the theoretical pI and molecular masses were calculated after removal of the presequences and compared with the experimental pI and molecular masses on the two-dimensional electrophoresis maps.

An external file that holds a picture, illustration, etc. Object name is 99-0380f3.jpg

Schematic Explanation of the Proteomics Strategy for Systematic Analysis of the Lumenal and Peripheral Thylakoid Proteins.

The proteins were separated according to their isoelectric point (pI) and then according to their molecular mass, resulting in a two-dimensional gel. The spots were then visualized by Coomassie blue or silver staining, and the gels were scanned for image analysis. Individual protein spots then were selected (exemplified by the encircled spot), excised from the gel, and digested with the site-specific protease trypsin (cleavage C-terminal of either a K residue or an R residue), resulting in a set of tryptic peptides. The peptides were extracted, and their masses were measured by MALDI-TOF MS. The list of measured peptide masses was compared with the masses of the predicted tryptic peptides for each entry in the sequence databases (NCBI, SWISS-Prot, and PIR). Multiple search rounds were performed as described in Methods. In case the protein was not unambiguously identified by MALDI-TOF MS, peptide sequence tags were obtained by ESI-MS/MS or Edman sequencing. The peptide masses and obtained sequence tags were used to search the public databases with the program MS-Tag and FASTA. To obtain sequence tags by Edman sequencing, we stained gels with Coomassie blue before blotting to increase the sensitivity and to allow easier matching of the gels. Spots containing 10 to 15 pmol or more were selected.

General Comments on the Two-Dimensional Electrophoresis Maps

Four hundred spots were analyzed by using MALDI-TOF, 20 of which were further analyzed by ESI-MS/MS and 55 of which were analyzed by N-terminal Edman sequencing. It is likely that none of the spots analyzed by Edman sequencing was blocked at the N terminus (see Discussion). The protein spots on the maps (Figure 2) were only numbered if the protein was identified or if sufficient amino acid sequence tags were obtained to identify the corresponding gene if it were present in the public databases. Unidentified proteins that were analyzed only by MALDI-TOF are not numbered.

Information about these numbered spots is summarized in Tables 1 to ​4. Spots from the acidic maps (pH 4.0 to 7.0) of peripheral and lumenal proteins are numbered 1 to 99 (Figure 2A) and 100 to 199 (Figure 2B), respectively. Spots from the basic maps (pH 7.0 to 11.0) of peripheral (Figure 2C) and lumenal proteins (Figure 2D) are numbered from 200 to 249 and 250 to 300, respectively. The same numbers are used on the lumenal and peripheral maps if spots could be matched by image analysis and confirmed by mass fingerprints or sequence tags. If proteins were identified on both maps, the number was chosen for the map in which the spot was most abundant. Spots for which no significant information was obtained were not numbered. These were mostly high (>60 kD) or low (<10 kD) molecular mass proteins of low abundance or spots located very close to an abundant photosynthetic protein on the two-dimensional electrophoresis maps.

Table 1.

Proteins Involved in Photosynthetic Electron and Carbon Metabolism Identified from the Two-Dimensional Electrophoresis Gels of the Lumenal and Peripheral Fractions (pI 4.0 to 7.0 and pI 7.0 to 11.0) Shown in Figures 2A to 2Da

MALDI-TOF Edmanf or ESI-MS/MSg Localization and CleavageSite Prediction
Spot No. Masses (kD) pI Identityb(Theorical Mass in kD) AccessionNumberc Cover %15 ppmd Cover %50 ppme Sequence N Terminus ofthe Proteinh PSORTi ChloroPj SignalPk
203 17.1 9.3 OEC16b(25.4) AF026400 YYAI/LAVSTgI/LNDVLSK 84-EAKPI Lumen: 0.96 54–55 AVL-AE82–83AEA-KPI85–86
11–14 21.9 5.7–6.4 OEC23 (28.0) P16059 23–37 27–46 74-AYGEA Mito: 0.75Tk: 0.28 22–23 ADA-AY73–74
30–33 29.8–30.2 5.5–5.8 OEC33 (34.9) P14226 27–35 32–42 82-EGAPK Lumen: 0.94 44–45 ASA-EG
10l 21.5 6.3 OEC33 (34.9) 31 31 81–82
100 9.8 4.7 Plastocyanin (17.1) P16002 35 35 VEVLLGASDg 70-VEVLL Tk: 0.66 55–56 ALA-VE
116 26.9 4.7 Plastocyanin (17.1) Lumen: 0.52 69–70
38 38.8 5.7 Ferredoxinb,mNADH red (40.6) 729479 14 15 53 ER: 0.55 51–52 None
117 27.9 5.1 Ferredoxinb (14.4) P27789 ATYNIKLITPELf 39-ATYNV Mito: 0.81Stroma: 0.57 25–26 AQA-TV39–40
34, 35, 37 37.3–39.9 6.1–6.3 CF1γ (41.3) 114640 15–24 22–24 53 Stroma: 0.94Tk: 0.75 35–36 None
7, 8 17.8–19.5 6.0–6.1 CF1δ (27.6) 399082 41 42 65 Cyto: 0.45Tk: 0.20 72–73 ALA-DL77–78
202 9.7 8.5 PsaEb (16.2) 1217601 16 23 Not clear ER: 0.55 51–52 EEA-AP56–57
201 7.2 9.0 PsaNb (15.5) P31093 SVFDEYLEKSKANKf (61)-SVFDE Lumen: 0.86 52–53 ARA-SV60–61
120, 121 37.5–40.6 6.2–6.5 Aldolasen (39.2) 399024 23–27 26–33 (39) Cyto: 0.65Tk: 0.28 37–38 None
102 11.9 6.4 RbcSn (20.2) P00869 32 45 (57) Stroma: 0.91 56–57 None
129 54.3 6.6 RbcLn (47.3) P04717 18 21 1 Chloroplast encoded
48–53 55–55.5 6.0–6.7 CF1α (55.1) 114522 21–44 26–51 1 Chloroplast encoded
5l 12.6 5.2 CF1α (55.1) 19 24
43–46 51.5–52.5 5.6–5.8 CF1β (53.1) 114560 37–45 46–59 1 Chloroplast encoded
41l 37.3 5.6 CF1β (53.1) 24 34

Table 4.

Identification of Proteins of Unknown Function (Hypothetical Proteins) from the Two-Dimensional Electrophoresis Gels of the Lumenal and Peripheral Fractions (pI 4.0 to 7.0 and pI 7.0 to 11.0) Shown in Figures 2A to 2D by ESI-MS/MS or N-Terminal Edman Degradationad

Edmanb or MS/MSc
Spot No. Masses (kD) pI Sequence Precursor Ion (M+H)+d Suggestions/Remarks
118 28.0 5.7 EEQEQEQEQDTKMAb RecA like prot.?
106 18.2 5.3 AKAGVNKPELLPb
114 24.7 5.7 EQQQQQQP(QN)RRF(R)Eb
125 45.8 5.8 (F)AEIE(A)EQNIEb
117 28.0 5.1 VXVKVXDXDXDb
128 52.3 4.4 (A)EDLGAEKPTSb(A)SXTGAEKPGb
251 11.9 8.6 (AG)EVAP(E)IL(D)VXQ(F)b
109 21.5 6.1 (F)SI/LFEI/LVc 1394.44
105 16.8 6.7 SVVAAYMV(EM)c 1706.74
203 17.1 9.3 NK/QPI/L(YK/Q)c 1531.9
9 21.5 6.0 I/LDSFPDFKcTI/LYI/LWI/L(T)c 1115.601239.68
104 17.1 6.2 GYI/LK/QDWEc 1314.66
3 9.4 5.0 K/QRWYAK/QAI/Lc 1315.65
4 11.3 6.2 (PyroE)-SSPA-443.35-Kc262.05-YTI/LI/LK/QSK/QI/LPGKc 1043.481396.66 See spots 2, 3, and 6 (Table 2)
23 27.1 6.3 STSI/LI/LE-321.18-Rc 1126.64
20 25.8 5.8 201.96-(NG)HSK/QI/LPPI/LEVc(K/QP)I/L-635.45-Rc316.06-FEVTY(I/LDWTR)c 2532.551645.78
21 25.7 5.6 Sequences found in spot 14226.11-PI/LTI/LP-462.29-RcI/LI/LYS-624.40c241.93-EDAGGI/LV-443.24-DKc 1384.861130.861816.92
22 27.5 5.5 VNVI/LK/QK/QI/L-406.3-RcPTTSP-446.3-RcVPI/LSG(S)-275.2-RcFK/QE(NG)EI/L(VDI/L)-290.2-(N)-Kc(DP)FENFP-287.2-A(NI/L)SKcYSSAA(PI/LS)-282.2-Rc 1873.221104.72990.661941.101665.921233.76 See spot 115 (Table 3)

Examples of MALDI-TOF and ESI-MS/MS

Two examples in which proteins were identified successfully have been selected to demonstrate the utility of our proteomics approach, the quality of the MS spectra, and the potential for homology-based searching with MS data that has been realized in this study. Homology-based searching is an important issue, because no plant genome has been sequenced completely and because sequencing of plant genomes from different species is in progress. It is also an important issue for those experimental plant systems for which no genomic sequences or expressed sequence tags are expected to be available in the near future.

Figure 4B shows the MALDI-TOF spectrum from protein spot number 123 from the peripheral map (pH 4.0 to 7.0), containing a mixture of two proteins. Five of the measured peptides matched (i.e., no miscleavage; within 50 ppm; no oxidation) the recently identified gene product Hcf136 from Arabidopsis; these five peptides are indicated in the protein sequence shown in Figure 4A. Hcf136 also can be seen on the lumenal map (Figure 2B, spot 123), as identified by N-terminal Edman sequencing (Table 2). This protein was determined earlier to be on the lumenal side of the thylakoid membrane, where it is involved in the assembly of PSII (Meurer et al., 1998). Thus, homology-based searching with MALDI-TOF data from a pea protein resulted in the successful identification of a protein that previously had been sequenced only in Arabidopsis.

Table 2.

Proteins Involved in Nonphotosynthetic Functions, Identified from the Two-Dimensional Electrophoresis Gels of the Lumenal and Peripheral Fractions (pI 4.0 to 7.0 and pI 7.0 to 11.0) Shown in Figures 2A to 2Da

MALDI-TOF Edmane or ESI-MS/MSf Localization and CleavageSite Prediction
Spot No. Masses (kD) pI Identityb(Theorical Mass in kD) AccessionNumberc Cover %50 ppmd Sequence % ofIdentityg N Terminus ofthe Proteinh PSORTi ChloroPj SignalPk
123 37.3 5.6 Hcf136b(44.1) O82660 24 EETLSE-ERVYLe 67 79-DE ER: 0.60Tk: 0.49 60–61 ARA-DE78–79
127 46.0 5.3 DegPb (46.2) 2565436 15 104-FV Tk: 0.91Lumen: 0.80 42–43 AVE-SA99–100VES-AS100–101
119 29.9 4.9 RNA binding protein (32.0) PSY14557 AAQEGETLTVEETVe 86 61-AA Mito: 0.61Stroma: 0.50 60–61 LFA-AQ61–62
124 39.1 4.6 Plastoglobule ass. prot. PG1 (38.4) 4105180 25 48-AG Stroma: 0.52 46–47 ISA-AG47–48
113 24.2 5.8 Cpn21 (26.9) O65282 ATVVAPKYTAIKe 83 52-AS Stroma: 0.89Lumen: 0.58 50–51(25–26) AQS-KP93–94VKA-AS51–52
130 57.5 5.3 Cpn60α (14.4) P08926 26 Stroma: 0.92Lumen: 0.68 45–46(23–24) AAA-KD49–50
131, 132 74.5 5.3 Hsp70 (41.3) 399942 27–35 68-KV Stroma: 0.94Lumen: 0.74 66–67(46–47) AVA-AM83–84
2, 3, 4, 6 9.3–15.3 4.4–6.9 Histone H4-likeb (27.6) 1806283 39–40 I/LSGI/LI/LYEETRf 100 Nucleus: 0.95 None None
40 41.2 5.6 Brittle-1b (16.2) 231654 16 76-DN Mito: 0.83Stroma: 0.52 44–45 None
42 49.5 5.8 Stearoyl-ACP desat.b (15.5) 2290402 17 Mito: 0.71 33–34 AMA-ST60–61
39 37.3 5.7 ACC oxidaseb (39.2) 4090533 19 Cyto: 0.45 None None
112 22.9 6.0 Ferritinb (15.5) AI443623 ATKGSSDNRVLTGVe 64 60-AT Cyto: 0.65 59–60 None
23–28, 205, 206126 27.1–28.546.0 5.3–6.3 Putative ascorbate perox/partial cDNAbPutative ascorbate perox/partial cDNAb AI490846 XDLIERRQRX(Y/E)Fe 77–93
5.8 AW156024AW185405AW185405AI490846 TDYEVDI/LI/LTTFTKf243.12-FSAVGI/LGPRfI/LNYEAYTYPRfADLIERRQRSEFQe 92788093 89-AD ER: 0.55 45–46 ANA-AD88–89
110 21.3 6.0 FKBP isomerase isolog/partial cDNAb AU070407AW092542 AGLPTEEKPPLLe 64 117-AG Tk: 0.88Lumen: 0.70 57–58 ALA-AG117–118

An external file that holds a picture, illustration, etc. Object name is 99-0380f4.jpg

MALDI-TOF MS Peptide Map of Spot Number 123 from the Peripheral Map (4 to 7).

(A) The precursor protein sequence of Hcf136 from Arabidopsis. The N-terminal part of the sequence in italics is the predicted presequence; the lumenal cleavage site is indicated by an asterisk. Five peptides were identified by MALDI-TOF MS (B) and are indicated in the protein sequence (underlined).

(B) The MALDI-TOF MS spectrum of the peptides generated by tryptic digestion of protein spot 123. The trypsin autodigested peptide ions 842.51 and 2211.11 (not labeled) were used for internal calibration. The MALDI-TOF MS spectrum of the peptides generated by tryptic digestion of the protein spot 123 matched (no miscleavage allowed; within 50 ppm; no oxidations) Hcf136 from Arabidopsis. Hcf136 can be seen on the lumenal map (spot 123) and was confirmed by N-terminal Edman sequencing. The second protein in this spot is CF1β, determined by the matching of six peptides (no miscleavage; four within 15 ppm; two within 40 ppm).

The second protein in this spot is CF1β, as determined by the matching of six peptides (without miscleavage) to the pea sequence, illustrating the ease with which proteins in mixtures can be identified by MALDI-TOF. Three peptides in the spectra originate from autodigestion of trypsin (at m/z ratios of 842.51, 2211.11 [peak not labeled], and 2807.29), and the first two peptides were used for internal calibration (Figure 4B). The other nonmatching peptides most likely result from domains of pea Hcf136 that are not 100% conserved with the Arabidopsis homolog; if there is a single amino acid residue mismatch between the pea peptide and the Arabidopsis sequence, then this peptide often will not match (at a <50-ppm mass resolution). We also observed that for a number of pea proteins present in the public databases, sequence conservation among different pea cultivars is incomplete. Therefore, it is likely that some nonmatching peptides in the spectrum are derived from CF1β.

Figure 5 shows an example of the identification of a protein with unknown function from Arabidopsis by ESI-MS/MS (spot 104). With ESI-MS/MS, a peptide (rather than an assigned precursor ion) from the protein digest is selected within the mass spectrometer and further fragmented (ionized) along the protein backbone by additional energy. Several precursor ions can be selected from the same sample and are measured consecutively. In this study, we typically selected three to five ions per protein digest for MS/MS analysis. The ESI-MS/MS spectrum of the doubly charged precursor ion at m/z 620.92 ([M+2H]2+) is shown. The complete y-ion series (from y1 to y11) could be assigned unambiguously as indicated (y ions are the C-terminal ions after fragmentation of the precursor ion) and are generally the predominant ions (for an overview, see Chapman, 1996; Burlingame et al., 1998). The experimentally determined pea peptide sequence tag (by reading the sequence from y11 to y1) matched (for 10 of 11 amino acid residues) a hypothetical Arabidopsis protein. This identification was further confirmed by two other MS/MS sequence tags and an N-terminal Edman tag, as indicated in Figure 5 (see also Table 3, spot 104).

Table 3.

Proteins without Assigned Functions, Identified from the Two-Dimensional Electrophoresis Gels of the Lumenal and Peripheral Fractions (pI 4.0 to 7.0 and pI 7.0 to 11.0) Shown in Figures 2A to 2Da

Edmanc or MS/MSd Localization and CleavageSite Prediction
Spot No. Masses(kD) pI Identity(TheoricalMass in kD) AccessionNumberb Sequence % ofIdentitye N Terminusof theProteinf DomainPredictiong BLASTSearchh PSORTi ChloroPj SignalPk
107 18.3 6.0 Unknown function (15.0) T21992 ATQRLPPLSTEPNRc 93 76-AX Pentapeptide repeat and many others Similar to A. thal. DNA clone AB015476 and with hyp. prot. Syn. sp. BAA17756 Stroma: 0.86Lumen: 0.45 No cTP VIA-AX75–76
104 16.0 6.2 Unknown function (46.2) AAC78263(W43350/ T45153) AILEADDDVELLEcAFVSSAAAFEKdI/LEADDDVEI/LI/LEK/QdGYI/LK/QDWEd 1009110071 80-AI Many but nothing obvious Lumen: 0.80 64–65 LVA-IG64–65
103 14.3 5.7 Unknown function (32.0) 2344892 (AAC31832) FKGGGPYGQGVTRGc 100 48-FK Pentapeptide repeat and many others Hyp. prot. Syn sp./D90917 PM: 0.68 34–35 ALA-FK47–48
108 21.3 6.3 Unknown function (38.4) AAC00624 VVKQGLLAGRIPGLc 93 71-VV Many but nothing obvious Cyto: 0.45 None ALA-FP63–64
115 27.4 5.5 Unknown function (26.9) AAC28768 YSSAAPI/L(I/L)dSPTEQ/KPd 7267 ? 2 main rhodopsin GPCR domains? and many others Other A. thal. clones Tk: 0.95 44–45 None
36 39.6 6.0 Hyp. protein (14.4) 3395429 17% of coverage at 50 ppm with MALDI-TOF ? Many but nothing obvious Perox: 0.80 35–36 None
19 24.0 5.6 Hyp. protein (41.3) Z97339 VI/LNK/QYLTE-482.2-RdI/LYYK/QVEANNKdSYASNNEI/LAVFPDQRd 898087 ? Eukar. Mo-pterin redox prot or euk RNA pol. heptapep. repeat? and many others Hyp. prot. Syn. sp./BAA18019 Stroma: 0.94Tk: 0.74Lumen: 0.72 75–76 AFA-ST104–105
22 27.6 5.4 Hyp. protein (16.2) AI600799 I/LYSLSAS(TI/LS)-401.2-KdGPI/LFK/QAVSSFRd 9067 12–22l18–29 Type-1 copper prot? and many others Hyp. prot. Oryza. sat. AA754382 partial cDNA Partial cDNA
111 24.1 6.5 Hyp. protein AW201127 RDVAVGSFLPPSc 83 (54)-RDm Nothing obvious Stroma: 0.84Lumen: 0.34 (14–15) SHA-RE(54–55)
204 16.9 8.5 Hyp. protein D47525AI855536 AESGFQPVVDRKGDc 64 (68)-AEm Nothing obvious Similar to A. thal. DNA clone AL132954 Perox. 0.64 (33–34) SFA-AE(68–69)

An external file that holds a picture, illustration, etc. Object name is 99-0380f5.jpg

Identification of a Thylakoid Protein in Spot 104 by ESI-MS/MS.

(A) Protein sequence of a hypothetical precursor protein from Arabidopsis identified in spot 104. The presequence is in italics, and the lumenal cleavage site is indicated by an asterisk. Protein spot 104 was identified by three experimental sequence tags determined by ESI-MS/MS and an N-terminal Edman tag. The four sequence tags are indicated in the protein sequence in boldface for ESI-MS/MS (I/LEADDDVELI/LEK; AFVSSAGAFEK; GYI/LK/QD) or underlined for Edman (AILEADDDEELLEK). Determination of the sequence tag AFVSSAAAFEK by ESI-MS/MS is shown in (B).

(B) Typical ESI-MS/MS mass spectrum of a peptide recovered after in-gel tryptic digestion of protein spot 104. Fragmentation of the doubly charged precursor ion at an m/z ratio of 620.92 yielded the y-ion series (y1 to y11) for which the sequence is indicated. The experimental sequence tag from the pea protein matched (for 10 of 11 amino acid residues) a hypothetical protein of Arabidopsis as indicated in (A). Note that the sequence tag should be read backward from y11 to y1.

Proteins Involved in Photosynthetic Electron Transport

Table 1 lists 12 abundant proteins that are known to be involved in photosynthetic electron transport reactions and carbon metabolism, as identified on the two-dimensional electrophoresis maps and as shown in Figures 2A to 2D. These proteins were all unambiguously identified by MALDI-TOF or N-terminal Edman degradation as indicated in Table 1. With MALDI-TOF, a large number (six to 20) of matching peptides were found at high mass accuracy (15 ppm) and with a coverage of the protein by the matching peptides ranging from 15 to 45% at 15 ppm and 21 to 59% at 50 ppm as indicated (calculated for the precursor protein).

The most abundant proteins are OEC16, OEC23, OEC33, and the soluble electron transporter plastocyanin. In addition, four proteins from the ATP–synthase complex (CF1α, CF1β, CF1δ, and CF1γ) were identified. Three other stromal-side peripheral proteins, ferredoxin, ferredoxin-NADPH-reductase, and PsaE, and the lumenal-side peripheral protein PsaN also were identified. PsaN was found on the basic map (spot 201), at approximately threefold lower abundance than OEC16 (spot 203 on the same basic map), in approximate agreement with the stoichiometry between PSII and PSI. These 12 proteins (Table 1) represent nearly all expected photosynthetic proteins (only CF1ε and some of the peripheral PSI proteins have not been identified), indicating that our maps give a good representation of the proteins present in the lumen and periphery of the thylakoid.

In addition, three stromal soluble proteins involved in carbon metabolism (i.e., RbcS, RbcL, and aldolase) were found on the maps. On the peripheral maps, only a very small amount of aldolase was observed, especially considering the high abundance of this protein. On the lumenal map, only a very small amount of RbcS was detected (in agreement with the protein gel analysis in Figure 1), whereas somewhat more RbcL and aldolase were present. In contrast, none of the very abundant soluble kinases involved in carbon metabolism (e.g., phosphoribulose kinase) was discovered on any of our two-dimensional electrophoresis gels. This could indicate that a specific subset of stromal proteins (RbcL and aldolase) has a strong affinity for the thylakoid membrane. In this respect, it is important to realize that a very small amount of proteins (i.e., in the femtomole range) can be detected by using two-dimensional electrophoresis and state-of-the-art MS instruments; thus, minor cross-contamination can be detected easily.

As is clear from Tables 1 (OEC33, OEC23, CF1α, and CF1β) and 2 (heat shock protein Hsp70 and chaperone Cpn60), several proteins were present at different pI values at a similar molecular mass, forming trains or beads of protein spots, indicating post-translational modifications, different isoforms, or RNA editing (see Discussion). We currently are investigating these modifications in detail.

Several breakdown products of OEC33, CF1α, and CF1β were identified, totaling <0.1% for each protein. These breakdown products also were present when two-dimensional electrophoresis maps were made from preparations in which the protease inhibitor cocktail was omitted during the complete isolation procedure (data not shown). Considering the broad specificity of this inhibitor cocktail, it is likely that these breakdown products reflect the proteolytic process naturally occurring in the thylakoid.

Proteins with a Nonphotosynthetic Function or No Obvious Functional Domains

In addition to the abundant proteins involved in photosynthetic electron transport and carbon metabolism, 18 proteins with clear functions (Table 2) and 10 proteins without obvious functional domains (Table 3) were identified. The identified proteins with a clear function were involved in DNA binding and transcription (i.e., four histone H4-like proteins and an RNA binding protein), oxygen radical scavenging (two ascorbate peroxidases), ADP–glucose transport (Brittle-1), proteolysis (DegP), chaperone or isomerase activity (Hsp70, Cpn60, Cpn21, and FKBP), protein assembly (Hcf136), Fe2+ binding (ferritin), lipid storage (plastoglobule-associated PG1 protein), and the biosynthetic pathway of fatty acids (stearoyl-acyl carrier protein desaturase). Amino-cyclopropane carboxylate oxidase, which catalyzes the last step of ethylene biosynthesis, also was provisionally identified, but this result needs to be further confirmed by sequence tags.

The histone-like proteins were only found on the two-dimensional electrophoresis maps from the peripheral fraction and not from the lumenal fraction. For these histone-like proteins, only highly conserved homologous genes, and no genes for chloroplast-localized proteins, were found in the database. Interestingly, the ascorbate peroxidase is most likely a lumenal protein with a twin arginine motif and 40% identical to other peroxidases. The sequence tags matched to expressed sequence tags from tomato and cotton and interestingly also to the moss Physcomitrella patens. These peroxidases were present in several spots in a wide pI range (pI 5.8 to 8.3) and at two different molecular masses, and they matched partial cDNAs.

A set of 10 proteins without assigned function but with identified full-length genes is listed in Table 3, and these proteins were analyzed for the presence of functional domains. Three of the identified proteins (spots 19, 103, and 107) are homologous with Synechocystis proteins (Table 3). Spots 103 and 107 are related to each other and were assigned to be part of a pentapeptide repeat family involved in lipid transport or assembly, but this assignment is based only on very weak similarity (23%) to a protein family in the cyanobacteria (Table 3). One of these postulated pentapeptide repeat proteins (spot 107) was identified earlier (Kieselbach et al., 1998), and the corresponding full-length Arabidopsis gene was identified in this study. An N-terminal sequence for spot 104 was found earlier in a two-dimensional electrophoresis map from total leaf extracts (Tsugita et al., 1996) as well as on a one-dimensional SDS electrophoretic gel of a lumenal preparation (Kieselbach et al., 1998). This protein was experimentally placed in the TAT pathway (Mant et al., 1999).

Proteins without Matching Expressed Sequence Tags or Genomic Sequence in the Public Database

Table 4 lists 18 protein spots from the two-dimensional electrophoresis maps (Figures 2A to 2D) for which a significant amount of amino acid sequence information was obtained. However, at the time this article was accepted for publication, the corresponding genes could not be identified by MALDI-TOF mass fingerprinting (at least 10 tryptic peptides after filtering for frequently recurring peptides, contamination, etc.), followed by database searching with sequence tags obtained by ESI-MS/MS or N-terminal Edman sequencing. The sequence tags are listed in Table 4.

Localization Prediction and Determination of Cleavage Sites

All identified proteins were analyzed by using the programs PSORT, ChloroP, and SignalP to verify the predicted location as well as the expected cleavage site by stromal processing peptidase(s) or thylakoid-bound lumenal peptidase (Tables 1 to ​3). This analysis was conducted to verify the predictive strength of these programs. If the programs indeed are predicting the localization and cleavage site of newly identified proteins with a high degree of confidence, then they can help to analyze ambiguous hits in a proteomics screen and could be used to perform a genome-wide search for thylakoid membrane and lumenal proteins. Our set of newly identified proteins provided us with an excellent sample on which we could test these programs.

The ChloroP program predicted that all of the nuclear-encoded photosynthetic proteins listed in Table 1 would be located in the chloroplast, which is not surprising because most of these proteins or their homologs were part of the training set for the development of the program (Emanuelsson et al., 1999). The chloroplast transit peptide was correctly predicted for those proteins localized on the stromal side, assuming that in all cases an additional amino acid was removed after processing, as described by Emanuelsson et al. (1999). Approximately 94% of the nonphotosynthetic proteins in Tables 2 and ​3 (excluding the histone-like proteins because we did not find the corresponding genes) were predicted to be localized in the chloroplast by ChloroP.

The program PSORT makes localization predictions for proteins in any of the plant organelles (i.e., nucleus, endoplasmatic reticulum, mitochondria, peroxisomes, and chloroplasts; Nakai and Horton, 1999). However, only 52% of the proteins in Tables 1 to ​3 were predicted by PSORT to be in the chloroplast.

Lumenal Transit Peptides

The lumenal transit peptides were analyzed by alignment of 26 nonredundant proteins in a so-called logoplot (Figure 6). In a logoplot, the sequence alignment is represented by a sequence of stacked letters in which the total height of the stack at each position shows the amount of information (conservation), whereas the relative height of each letter shows the relative abundance of the corresponding amino acid (Schneider and Stephens, 1990). Such logoplots have been used successfully to analyze different signal peptides of Gram-positive and Gram-negative bacteria (Nielsen, 1999). Seventeen nonredundant proteins that had been shown experimentally to possess lumenal transit peptides or lumenal localization were found in the public databases, whereas nine were identified in this study. The N termini of six of the nine newly identified lumenal proteins were obtained by Edman sequencing. Lumenal transit peptides have been well studied during the past decade. The semiconserved consensus sequence for the cleavage site of the transit peptide is AXA↓X, with the arrow representing the cleavage site and X representing any amino acid. In addition, a high frequency of alanine residues as well as leucine residues upstream of this consensus sequence have been observed (reviewed in Settles and Martienssen, 1998; Keegstra and Cline, 1999). A similar structure has been reported for signal peptides of Gram-negative and Gram-positive bacteria (Nielsen et al., 1997; Nielsen, 1999). After alignment of the 26 sequences according to their cleavage site in a logoplot (Figure 6), we observed a nearly complete conservation for the −1 position (alanine), in agreement with site-directed mutagenesis studies (Shackleton and Robinson, 1991). At the −3 position, there is a preference for alanine as well as valine, serine (small and neutral residues), and, unexpectedly, aspartic acid. At the +1 position, there is a preference for alanine, valine (small, neutral), glutamic acid, and aspartic acid (negatively charged). After the alanine/leucine–rich hydrophobic region, a fairly high frequency of prolines at the −6, −5, and −4 positions can be seen. This is likely to stimulate helix breaking of the hydrophobic region and possibly ensures interaction with the thylakoid processing peptidase. Further downstream (+2 and +4), a preference for the negatively charged glutamic acid is observed, which is not found in Gram-positive and Gram-negative bacteria (Nielsen et al., 1997; Cristobal et al., 1999).

An external file that holds a picture, illustration, etc. Object name is 99-0380f6.jpg

Logoplot of Thylakoid Proteins with Lumenal Transit Peptides Aligned According to the Predicted Cleavage Site (between −1 and +1) of Their Lumenal Transit Peptide.

The main figure shows the logoplot of 26 different proteins with lumenal transit peptides, without any redundancy. The top inset shows the logoplot of a subset of 13 proteins targeted via the ΔpH/TAT pathway. The bottom inset shows the logoplot for the remaining 13 proteins. The height of the stack of letters at each position shows the amount of information, defined as the difference between the maximal and actual entropy (Schneider and Stephens, 1990), whereas the relative height of each letter shows the relative abundance of the corresponding amino acid. Positively and negatively charged amino acids are shown in black and red, respectively; external polar residues (N and Q) are shown in yellow, internal apolar (F, L, I, M, and V) in green, and ambivalent (P, T, S, C, A, G, Y, and W) in blue. Proteins used in the logoplots are for the ΔpH/TAT pathway (top inset): OEC16 (P12301); OEC23 (P16059); PsaN (P49107); Hcf136 (O82660); polyphenol oxidase (Q08303); PsbT (Q39195); the new ascorbate peroxidases in spots 23 to 28, 205, 206, and 125 in Table 2; and spots 19, 104, 108, 110, 111, and 204 in Table 3. The remaining proteins (bottom inset) are as follows: plastocyanin (P16002); OEC33 (P14226); DegP (AAC39436); CFoII (BAA09134); violaxanthin-deepoxidase (AAC50032); rotamase (CAA72792); PsbY1 (P80470); PsbY2 (P80470); CtpA (BAA09134); PsbX (AAD25151); PsaF (P13192); and spots 107 and 103 in Table 3.

Thirteen of these 26 lumenal proteins contain a twin arginine motif (R-R-X-o-o, with o representing a hydrophobic residue) and are predicted to translocate through the thylakoid membrane via the ΔpH/TAT pathway (Settles and Martienssen, 1998; Dalbey and Robinson, 1999; Mori et al., 1999; legend to Figure 6). Six of the newly identified proteins contain this twin arginine motif. Alignment of the signal sequences by the twin arginines points to a very strong preference for a leucine or a methionine residue at the third position after the two arginines (data not shown), which would favor R-R-X-o-L/M as the general consensus motif for substrates of the TAT pathway. The 13 proteins were aligned in a separate logoplot (Figure 6, top inset). A striking feature is that the hydrophobic region in these TAT-dependent proteins is less hydrophobic than it is in the other 13 proteins (Figure 6, bottom inset). The TAT pathway proteins show a strong preference for a serine or alanine at the −8 position. No conservation of other “Sec avoidance” signals was found, such as a postulated basic residue directly after the hydrophobic region (Dalbey and Robinson, 1999).

We decided to systematically test, based on this alignment, whether the program SignalP could predict the correct cleavage sites of lumenal transit peptides (as listed in Tables 1 to ​3) in either the Gram-negative or Gram-positive mode. In all cases in which we knew that the protein was located in the lumen, a lumenal cleavage site was found by SignalP. In most of these cases, the SignalP prediction corresponded to the experimentally determined N terminus (Tables 1 to ​3). In a few other cases (e.g., OEC16 and DegP), the predicted site was shifted a number of residues with respect to the experimentally determined cleavage site. A possible reason is that in bacteria, the −1 position is not as much biased toward an alanine residue as compared with the chloroplast lumenal transit peptides. Removal of the stromal transit peptide generally did not influence the predicted lumenal cleavage site. This is important because it allows genome-wide screening of the Arabidopsis genome for lumenal transit peptides. However, SignalP also identified one or more lumenal cleavage sites for a number of proteins that are located at the stromal side of the thylakoid membrane (i.e., ferredoxin, Psa E, CF1δ, and an RNA binding protein). The selectivity might be improved if SignalP were to be equipped with a specific option for chloroplast lumenal transit peptides.

DISCUSSION

Proteomics already has become an important tool for drug discovery and the analysis of yeast and Escherichia coli protein expression patterns; however, it has not been widely applied in plant biology. This study demonstrates some of the potential of proteomics that can be realized for plant sciences. This potential will become more evident once the full genome sequences of Arabidopsis, rice, or other plant genomes are available. The total proteome of higher plants is estimated to consist of ∼21,000 to 25,000 proteins (see, e.g., Bouchez and Hofte, 1998; Meinke et al., 1998), and chloroplasts contain 10 to 25% of this total, an illustration of the central importance of the chloroplast for the plant cell. Our long-term goals are to analyze systematically the proteome of the thylakoids and to understand thylakoid biogenesis and maintenance through further functional analysis.

Total Number of Peripheral and Lumenal Thylakoid Proteins

It was estimated from image analysis that the two acidic maps of lumenal and peripheral proteins each contain at least 360 to 400 protein spots, whereas each basic map contained at least 50 to 60 spots. Thus, there is a total of ∼820 to 920 protein spots that can be detected easily by silver staining. In most cases, the apparent molecular mass corresponded with the theoretical mass of the mature protein. In a few cases (e.g., ferredoxin, plastocyanin, and CF1γ), the same protein was observed at a molecular mass higher than the theoretical molecular mass. In case of plastocyanin, this is likely due to aggregation during focusing in the first dimension because of the very high abundance of the protein. The 11-kD ferredoxin was identified in a small protein spot (spot 117) at 30 kD and pI 5.0 and therefore must be in an aggregate or oligomeric form, modifying both molecular weight and experimental pI value (the theoretical pI value of monomeric ferredoxin is below 4.0 and thus was outside the range of the two-dimensional electrophoresis maps).

To estimate the total number of proteins, it is necessary to conduct a correction for the 39% overlap between the lumenal and peripheral maps, calculated by computer-aided image analysis. This strong overlap between the lumenal and peripheral proteins was expected because proteins can be transiently bound to the thylakoid membrane as part of their function (e.g., plastocyanin) or during thylakoid biogenesis and protein complex assembly (e.g., the OEC proteins). In addition, some of the thylakoid-bound proteins are released during the sonication procedure. However, through image analysis, one can clearly distinguish between proteins that are mostly soluble and mostly membrane bound. Naturally, protein partitioning into lumenal or peripheral fractions can be influenced by altering the purification procedure, for example, by using shorter sonication times (data not shown) or French or Yeda presses.

To further calculate the total number of functionally different proteins from the observed number of protein spots, it is necessary to perform a correction for different isoforms, post-translational modifications, and proteolysis. Isoforms can be expected for several of the photosynthetic proteins, because multigene families have been reported in pea, spinach, and tobacco for OEC23 (e.g., Hua et al., 1992), OEC33 (e.g., Wales et al., 1989), and ferredoxin–NADPH reductase (e.g., Gozzer et al., 1997). Post-translational modifications are likely to occur not only through phosphorylation but also by post-translational methylation (e.g., of RbcS; Grimm et al., 1997), carbamylation (e.g., for RbcS and RbcL; Smith et al., 1988), glycosylation (e.g., for CF1; Maione and Jagendorf, 1984), and palmitoylation (e.g., for the D1 protein; Mattoo and Edelman, 1987). Finally, alternative splicing produced both stromal and thylakoid bound forms of ascorbate peroxidase (Mano et al., 1997), and editing of mRNA leading to heterogeneous proteins has been reported (Sugita and Sugiura, 1996). As discussed earlier (Table 1), proteolytic fragments also were observed. Finally, beads of spots can result from carbamylation induced by sample preparation, even when appropriate precautions are taken.

After taking these multiple forms into account, we estimate that there is a total of at least 200 to 235 proteins in the lumen and periphery of the thylakoid membrane. However, this number is conservative because it is likely that several proteins of lower abundance were masked on the acidic maps by the predominant OEC23 and OEC33 proteins. Visualization of these masked proteins would be possible after removal of the abundant OEC proteins, for instance, by affinity chromatography and/or by two-dimensional electrophoretic analysis using isoelectric focusing strips with narrower pI ranges.

MS- and Homology-Based Searches

In this study, ∼400 spots were analyzed by MALDI-TOF MS, followed by analysis of 20 spots by ESI-MS/MS and 55 by N-terminal Edman sequencing. As a result, 61 different proteins were identified. Breakdown products and modified forms of several of these proteins also were identified (Tables 1 to ​3). For ∼100 spots, insufficient information was obtained for positive identification, and ∼30 of those will be analyzed by ESI-MS/MS. Because Edman degradation sequencing resulted in good N-terminal sequence from all 55 proteins spots, with the quality and size of the peaks on the chromatograms proportional to the intensity of the spots, it is likely that none of the spots analyzed by Edman degradation was blocked at the N terminus. Note also that for several spots, two or even three amino acid sequences could be determined by Edman sequencing. Thus, it is likely that the N termini of most mature proteins in the chloroplast are not modified in vivo. This is in contrast to a two-dimensional electrophoresis study of rice and Arabidopsis total cellular proteins, in which 40 to 60% of the measured proteins were blocked (Tsugita et al., 1996). We should point out that all nuclear-encoded chloroplast proteins are N-terminally processed after import to remove the transit peptide; thus, N-terminal modifications that occurred in the cytosol are removed.

With the completion of the sequencing of the different plant genomes, it is expected that most (if not all) of the proteins listed in Table 4 will be identified. It was, however, possible to identify a number of the proteins by their similarity to proteins from other species based only on the mass fingerprints. This was possible because many plant proteins are well conserved, an encouraging prospect for future high throughput plant proteomics once a number of plant genomes have been fully sequenced. Digestion of a protein spot with a second protease (such as V8 or chymotrypsin) in parallel to digestion with trypsin also could help to identify proteins with mass fingerprints (MALDI) alone, because the two different proteases generate different peptides, thereby increasing the total number of peptides available for identification by database searching.

To avoid contamination with nonchloroplast proteins, we used intact, purified chloroplasts for the starting material for the two-dimensional electrophoresis maps in this study. Pea (or spinach) is preferred over other species, such as Arabidopsis, because of the ease with which intact chloroplasts can be purified. The disadvantage is that homology-based searching with mass fingerprints must be conducted. This type of analysis is more demanding and often requires confirmation by electrospray ESI-MS/MS or Edman sequencing. However, with the combination of these three techniques, it was possible to identify the corresponding genes in other plant species (Tables 1 to ​4).

Functional Role of Lumenal and Peripheral Thylakoid Proteins

The 58 proteins (excluding the three stromal proteins in Table 1) listed in Tables 1 to ​4 were classified in a pie diagram according to their function (Figure 7), following the categories used in the analysis of a 1.9-mB contiguous sequence of chromosome IV of Arabidopsis (Bevan et al., 1998). This is a helpful tool to visualize the more general role of the lumenal and peripheral membrane proteins. For 17% of the proteins, no clear function could be predicted, whereas for 31% of the proteins, no expressed sequence tag or full-length gene could be assigned. The remaining 52% could be classified according to their function.

An external file that holds a picture, illustration, etc. Object name is 99-0380f7.jpg

Assignment of the Identified Proteins to Functional Categories by Using Classifications as Described by Bevan et al. (1998).

In total, 58 proteins were classified. The categories are as follows: energy (the 12 nonstromal proteins in Table 1), transcription/translation (spot 119 in Table 2), metabolism (spot 42 in Table 2), growth and division (spots 2, 3, 4, 6, 39, and 124 in Table 2), protein destination and storage (spots 110, 112, 113, 123, 127, and 131 to 133 in Table 2), transport (spot 40 in Table 2), defense (spots 23 to 28, 126, 205, and 206 in Table 2), no assigned function (the 10 proteins in Table 3), and no identified gene or homolog (the 18 proteins in Table 4).

Naturally, a significant fraction of the proteins is involved in energy production (21%), either in photosynthetic electron transport or in ATP production. So far, only one transporter, Brittle-1, was found on the two-dimensional electrophoresis maps of both the lumenal and peripheral fractions. Brittle-1 has an adenylate translocator function in the transfer of ADP glucose in amyloplasts. Brittle-1 was cloned using a transposon-tagged maize mutant (Sullivan et al., 1991) and was found to be localized in amyloplast membranes (Sullivan and Kaneko, 1995). Because starch also accumulates in chloroplasts, Brittle-1 is likely to have a similar function as in amyloplasts. Indeed, in vitro import assays showed that Brittle-1 could be targeted to the chloroplast inner envelope membranes, but the protein had not been found previously in chloroplasts in vivo (Li et al., 1992). Transmembrane prediction programs (such as TopPred2, DAS, and Tmpred; see http://www.expasy.ch/tools/#transmem) indicate that Brittle-1 could be an integral membrane protein, but the predicted transmembrane domains are shorter and less hydrophobic than are observed usually. We have not detected any integral membrane protein on the two-dimensional electrophoresis maps, which is expected because the protein fractions were centrifuged at high gravity values to remove any membranes. Thus, it was surprising to find Brittle-1 on both the lumenal and peripheral maps, an observation that calls into question the suggestion that Brittle-1 is an integral membrane protein. Because the protein had not been detected previously in chloroplasts, further experimentation is needed to more precisely define the localization of Brittle-1.

The two-dimensional electrophoresis maps also revealed a lipid storage protein (PG1), which was classified in the cell growth and division category (Figure 7). PG1 recently was identified in plastoglobules extracted from pea chloroplast membranes, and in vitro import assays confirmed its localization in the chloroplast (Kessler et al., 1999). Plastoglobules have been observed in different types of plastids and are thought to serve as lipid reservoirs for thylakoid membranes. In addition to PG1, plastoglobules possess several other proteins associated with the outer surface, such as the 30-kD plastid lipid–associated protein (Pozueta-Romero et al., 1997). It is quite likely that several of those proteins are present on the two-dimensional electrophoresis maps, and they should be identified in the near future.

Other proteins involved in cell growth include a set of four histone H4-like proteins in the 10- to 14-kD size range. A large fragment of histone H4 is extremely well conserved among many eukaryotes, varying little in monocotyledons and dicotyledons, green algae, insects, and mammals. The identified peptides of the four proteins all corresponded to this conserved region. Two chloroplast proteins that are electrophoretically similar to histones H2A, H2B, and H3 of pea cell nuclei were isolated earlier from chloroplast nucleoids. The amino acid composition of these proteins demonstrates high similarity with the HU proteins of E. coli (Yurina et al., 1995), but the identified peptides of the histone-like proteins in this study did not match those HU proteins. In total, ∼30 polypeptides from 94 to 12 kD in size were detected in nucleoids. These nucleoids have been found on the thylakoid surface (Liu and Rose, 1992) and chloroplast inner envelopes (e.g., Sato et al., 1993). The histone-like proteins were found exclusively in our two-dimensional electrophoresis maps of the peripheral proteins, which would be in agreement with a fairly tight binding (i.e., sonication resistant) of plastid DNA to the thylakoid surface. An alternative explanation for our finding of these histone-like proteins is that a minor contamination with nuclei has occurred. To avoid such a contamination, we purified the intact chloroplasts on linear Percoll gradients, in which nuclei are known to sediment to the bottom, whereas intact chloroplasts can be collected between ∼50 to 70% Percoll (Gallagher and Ellis, 1982).

The finding in this study of a known chloroplast-localized RNA binding protein points to the role of the thylakoid as a surface for transcription and translation, in agreement with observations of membrane-associated polysomes (Hattorie and Margulies, 1986; Jagendorf and Michaels, 1990) and plastid DNA.

One enzyme involved in fatty acid biosynthesis was identified and was the only protein classified under metabolism (Figure 7). It is indeed well known that fatty acid and lipid metabolism takes place both in the plastid as well as in the endoplasmic reticulum (Miquel and Browse, 1992). Further studies are required to determine where in the chloroplast these enzymes are localized.

Approximately 12% of the proteins were classified under protein destination and storage (Figure 7). They include the protease DegP, the assembly factor Hcf136, ferritin, three chaperones, and a cis-trans isomerase that possesses a lumenal transit peptide. Ferritin concentrates and stores cellular iron to ∼1011 times the solubility of the free ion and earlier had been determined to be a plastid protein located at the stromal side of thylakoid membranes (e.g., Waldo et al., 1995). Ferritin accumulation is positively correlated to iron loading of the plant, is regulated at a transcriptional level (Gaymard et al., 1996), and is under developmental control (Lobreaux and Briat, 1991). The FKBP isomerase is only the second isomerase identified in the thylakoid; the other one was named TLP40 (Fulgosi et al., 1998).

The presence of fairly high amounts of the chaperones Hsp70, Cpn60, and Cpn21 (Table 2) on the lumen map (Figure 2B), but not on the peripheral map (Figure 2A), is intriguing. These ATP-dependent chaperones have been demonstrated to be functionally important for protein import across the envelope and for assembly of protein complexes (e.g., Lubeck et al., 1997; Keegstra and Cline, 1999). The MALDI-TOF or Edman sequencing data matched best to the genes for stromal chaperones. We therefore conclude that either (1) the genes for the lumenal chaperones have not been sequenced or (2) the gene products have a dual location due either to dual targeting to the stroma and lumen or to alternative splicing, as observed for stromal and thylakoid ascorbate peroxidases (Mano et al., 1997; Yoshimura et al., 1999). ChloroP predicts alternate stromal cleavage sites for the three chaperones if the program is provided with only the N-terminal sequences truncated before the experimentally determined (Hsp70) or postulated N terminus of the mature protein (Cpn60 and Cpn21; see Table 2; prediction for alternate cleavage site). This often is not the case for other substrates (data not shown). When these alternatively cleaved precursors then are analyzed by SignalP, lumenal cleavage sites are detected within the first 100 amino acid residues of the precursor proteins (Table 2). Thus, if the actual stromal transit peptide is cleaved more toward the N terminus, it might be possible that these chaperones are targeted to the lumen. At this point, note that different stromal-processing peptidases may exist (Su and Boschetti, 1994; Koussevitzky et al., 1998) and that a similar alternative cleavage behavior and dual targeting have been proposed for polyphenol oxidase (Koussevitzky et al., 1998).

A third possible explanation for the concentration of chaperones on the lumen map is that the proteins are bound to the stromal side of the thylakoid membrane. They might be released together with the lumenal proteins upon sonication, just like CF1α and CF1β (Figure 1). It is likely that the stromal chaperones are involved in protein assembly at the thylakoid surface. Schlichter and Soll (1996) searched for lumenal chaperones and protease-treated thylakoids before release of the lumenal content; they found shorter isoforms of the stromal chaperones, based on immunodetection with antisera generated against the stromal chaperones. Their results might be explained by only partial digestion of the stromal chaperones when they are assembled in membrane-bound complexes at the thylakoid surface. Clearly, additional experiments are required to draw any final conclusions.

Consensus Sequences and Prediction of Localization and Cleavage Sites

The neural network program ChloroP was most successful in localizing the newly discovered chloroplast proteins, whereas PSORT assigned 52% of the identified proteins to the chloroplast. What does this mean for the use of these programs to confirm localization of chloroplast proteins? If the proteins indeed are localized in the chloroplast, ChloroP recognizes ∼94% of them, whereas PSORT probably identifies ∼52%. ChloroP was trained with a positive test set of 75 known transit peptide containing chloroplast proteins (excluding lumenal proteins) and with a negative test set (75 proteins from other nonchloroplast localizations; Emanuelsson et al., 1999). When the authors tested 715 Arabidopsis entries in the SWISS-Prot database, they observed that 96% of those annotated in the database as being chloroplast localized were indeed predicted to be in the chloroplast. By contrast, 11% of proteins annotated as nonchloroplast proteins were predicted to be in the chloroplast, which would result in ∼2500 false positives for the full genome. In addition, the program is somewhat biased toward the known chloroplast proteins, as exemplified by the 100% score for the 12 well-known photosynthetic proteins on our maps. The PSORT program is much more ambitious in that it predicts proteins to 17 different locations (three within the chloroplast) in the plant cell. Judging from information available on the PSORT WWW server (http://psort.nibb.ac.jp:8800/), the program has not been as rigorously tested as ChloroP has been (Nakai and Horton, 1999). It is also possible that PSORT is more conservative in its prediction of proteins to be localized in the chloroplast. Testing newly identified proteins by both programs is probably the best option at the moment to get additional support/confirmation for chloroplast localization. It has been noted that in many cases, an additional amino acid residue needs to be removed to obtain the mature protein sequence (Richter and Lamppa, 1998; Emanuelsson et al., 1999).

The lumenal transit peptides of a set of 26 nonredundant proteins were analyzed by aligning the sequences according to the experimentally determined cleavage sites. As expected, the presequence and cleavage sites had features similar to those of signal peptides in Gram-negative and Gram-positive bacteria. However, several of these features are more pronounced in lumenal transit peptides. These include the presence of the prolines at the end of the hydrophobic domain, the nearly complete conservation (25 out of 26) of the alanine at the −1 position, and the predominance of glutamic acid at the +2 and +4 positions. Aside from the twin arginine motif, the subset of 13 ΔpH/TAT proteins has a less A/L–rich hydrophobic region than the other proteins and a striking preference for a serine or alanine at the −8 position. The Sec avoidance motif (Dalbey and Robinson, 1999; Keegstra and Cline, 1999) does not seem to be a basic residue directly after the hydrophobic domain but is more likely to be the overall hydrophobicity, as was recently discussed for E. coli (Cristobal et al., 1999). A higher hydrophobicity (i.e., more leucines) in the signal peptide is likely to favor targeting via the Sec pathway. Alignment of the signal sequences by the twin arginines suggests a strong preference for a leucine or a methionine residue at the third residue after RR (data not shown). The twin arginines are positioned between −21 and −29 with respect to the lumenal cleavage site. These features taken together should make it possible to predict, with high confidence, lumenal transit peptides of plant proteins on a genome-wide scale. Adaptation of SignalP for chloroplast proteins might make this possible. It is important to note that for these programs to work correctly, the initiating methionine needs to be correctly assigned in the database; we observed several cases (spots 104 and 110) in which the assignment was incorrect, resulting in a negative (incorrect) prediction.

Future Perspectives and Conclusions

In this study, we have presented high-resolution two-dimensional electrophoresis maps from thylakoids of higher plants. Forty-five percent of the visible proteins were analyzed, and in total, 61 different proteins were identified. For 18 of those, no corresponding full-length gene could be found, but we expect to identify most of these genes once the sequencing of complete plant genomes is completed. A reverse genetics approach using tagged Arabidopsis mutants is now in progress to identify the function of the newly identified proteins. The two-dimensional electrophoresis maps in this study, their updates, and accession numbers in SWISS-Prot will become available via our website located at http://www.biokemi.su.se/chloroplast.

METHODS

Chemicals and Materials for Two-Dimensional Electrophoresis Analysis

Pharmalyte, pH 3.0 to 10.0, immobilized pH gradient (IPG) buffer 6 to 11, reswelling tray, and equipment for running the IPG gels (Multiphor II and dry strip kit) came from Pharmacia Biotech (Uppsala, Sweden). 3-([3-Cholamidopropyl]dimethylammonio)-1-propane-sulfonate (CHAPS), caprylyl sulfobetaine, Tris-HCl, and Triton X-100 were purchased from Sigma, and tributylphosphine (TBP) was from Fluka (Buchs, Switzerland). Acrylamide was obtained from BDH (Poole, UK) or Bio-Rad. Piperazine diacrylamide (PDA) was obtained from Bio-Rad. Urea, thiourea, and glycine came from Labasco (Stockholm, Sweden). Ammonium persulfate, N,N,N′,N′-tetramethylethylenediamine (TEMED), and Tricine came from Bio-Mol (Hamburg, Germany) and Bio-Rad. DTT was produced by Kodak (Rochester, NY).

Growth of Plants

Pea (Pisum sativum var De Grace) plants were grown for 12 to 14 days in a growth chamber at 25/21°C day/night temperatures with 12 hr of artificial light of ∼100 μmol of photons m−2 sec−1. The first expanded leaves were collected and used for chloroplast isolation.

Isolation and Fractionation of Intact Chloroplasts

Intact chloroplasts were isolated and purified on Percoll gradients according to Cline (1986). For preparation of thylakoid lumen, chloroplasts (equivalent to 20 to 40 mg of chlorophyll) were ruptured by osmotic shock in 50 mM Tris-HCl, pH 8.0, and 5 mM MgCl2 at a chlorophyll concentration of 1 mg mL−1 for 10 min at 4°C. This lysis medium and all solutions used in subsequent steps contained a protease inhibitor cocktail (50 μg mL−1 Pefabloc [Bio-Mol], and 1 μg μL−1 antipain, leupeptin, and phosphoramidon). Thylakoids were recovered by centrifugation at 10,000_g_ for 10 min at 4°C, washed three times with 10 mM Tris-HCl, pH 8.0, and resuspended in 10 mM Tris-HCl, pH 8.0, and 5 mM MgCl2, at a chlorophyll concentration of 0.5 mg mL−1. To liberate the soluble lumenal proteins, we then sonicated the thylakoids 10 times for 30 sec each at 4°C (power 10.0; Misonix Inc., Farmingdale, NY). The thylakoid membranes were separated from the soluble lumenal proteins by centrifugation for 1 hr at 145,000_g_ at 4°C. The clear supernatant containing the lumenal proteins was concentrated in an Amicon (Beverly, MA) cell (3-kD cut-off filter) to a protein concentration of 5 to 7 mg mL−1.

For isolation of the peripheral thylakoid proteins, the sonicated and pelleted thylakoid membranes were washed once with 10 mM Tris-HCl, pH 8.0, centrifuged at 145,000_g_ for 30 min at 4°C, and resuspended in 25 mM Mes, pH 6.5, and 0.5 M CaCl2 at a chlorophyll concentration of 0.1 mg mL−1. This thylakoid suspension was gently stirred for 30 min at 4°C and centrifuged for 1 hr at 220,000_g_ at 4°C to separate extracted peripheral thylakoid proteins from the remaining thylakoid membranes. The clear supernatant, containing the peripheral proteins, was concentrated to 12 to 15 mg protein mL−1 in an Amicon cell (3-kD cut-off filter) while reverting to 10 mM Tris-HCl, pH 8.0. The yield of lumenal and peripheral proteins was 0.01 to 0.03 and 0.02 to 0.06 mg protein mg−1 chlorophyll of isolated intact chloroplasts, respectively.

Two-Dimensional Gel Electrophoresis

Two solutions, A and B, were used to solubilize the samples for isoelectric focusing. Solution A contained 9 M urea, 4% CHAPS, 2 mM TBP, and 2% pharmalyte, pH 3.0 to 10.0 (for the pH 4.0 to 7.0 two-dimensional electrophoresis map), or 2% IPG buffer 6.0 to 11.0 (for the two-dimensional electrophoresis map pH 7.0 to 11.0), and 0.5% Triton X-100. Solution B contained 5 M urea, 2 M thiourea, 2 mM TBP, 2% CHAPS, 2% SB 3-10, 0.5% Triton X-100, and 2% pharmalyte, pH 3.0 to 10.0 (pH 4.0 to 7.0 map) or 2% IPG buffer 6.0 to 11.0 (pH 7.0 to 11.0 map) (Rabilloud, 1998). For the analytical and preparative gels, individual 13-cm IPG strips (pH 4.0 to 7.0 or 6.0 to 11.0) were rehydrated overnight with 250 μL of protein sample in solution A (for lumen) or B (for peripheral) in a reswelling tray at room temperature. The isoelectric focusing was conducted at 18°C by using a Pharmacia Multiphor II with a DryStrip kit and a Pharmacia 3500XL power supply, following the running conditions in Rouquié et al. (1997). Isoelectric focusing strips were focused for ∼80 kVhr.

The focused strips were equilibrated in a solution containing 6 M urea, 30% glycerol, 50 mM Tris-HCl, pH 6.8, 5 mM TBP, and 2% SDS (w/v) for 20 min (Rabilloud et al., 1997; Herbert et al., 1998). Separation in the second dimension was conducted at room temperature on gradient Tricine-SDS gels (8 to 16% acrylamide) (Schägger and von Jagow, 1987). After equilibration, IPGs were embedded in an agarose solution at the top of the Tricine-SDS gel as described by Rabilloud et al. (1994a). The protein spots in the analytical gels were visualized by staining with silver nitrate (pH 4.0 to 7.0 map; Rabilloud et al., 1994b) or silver ammonia (pH 7.0 to 11.0 map; Hochstrasser and Merril, 1988). Preparative gels were stained with Coomassie Brilliant Blue R 250. The pI and molecular mass scales of the two-dimensional electrophoresis maps were internally calibrated by mixing carbamylated standards (Pharmacia Biotech) with the lumenal and peripheral samples before two-dimensional electrophoresis analysis. For external calibrations, molecular mass markers were loaded onto the second dimension.

Image Analysis of Two-Dimensional Electrophoresis Gels

After staining, gels were scanned using a flatbed scanner, and the data were analyzed using Melanie II software (Bio-Rad). After selecting so-called landmarks and the assignment of all features, two-dimensional electrophoresis images were aligned and matched.

Matrix-Assisted Laser Desorption Ionization–Time of Flight Mass Spectrometry and Electrospray Tandem Mass Spectrometry

Coomassie Brilliant Blue R 250–stained protein spots were excised from the gel and prepared for mass spectrometry (MS) analysis (Edvardsson et al., 1999). The peptide extract (1 μL) from each tryptic digest was crystallized in 0.5 μL of matrix solution (α-cyano-4-hydrocynnamic acid in methanol; Hewlett-Packard, Böblingen, Germany) on the matrix-assisted laser desorption/ionization–time of flight (MALDI-TOF) target plate. Molecular mass information of the peptides was obtained by using a MALDI-TOF mass spectrometer, equipped with a nitrogen laser and operating in reflector/delay extraction mode (Voyager-DE-STR; Perseptive Biosystems Inc.). All MALDI-TOF spectra were internally calibrated using either trypsin autodigestion peptides (842.51 D and 2211.11 D) or ACTH (18 to 39) and bradykinin.

To obtain sequence information by electrospray ionization tandem MS (ESI-MS/MS) (Q-TOF, Micromass, and SCIEX API-365; Perkin-Elmer), we purified the remainder of each peptide extract by using PorosTM 50 R2 beads (Perseptive Biosystems), as described in Gobom et al. (1998) and Edvardsson et al. (1999). The peptides were eluted from the Poros beads with 8 μL of 50% (v/v) methanol and 5% (v/v) formic acid, and the solution was loaded into a nanoelectrospray needle (Au/Pd-coated glass capillaries; Protana A/S, Odense, Denmark). The instrument was calibrated with polypropylene glycol, according to the manufacturer's specifications.

Protein Gel Blotting, Edman Sequencing, and Antisera

For N-terminal Edman sequencing, Coomassie Brilliant Blue–stained gels were equilibrated in 100 mM boric acid, pH 8.5 (NaOH), and 0.12% SDS twice for ∼1 hr, principally according to Bauw et al. (1987)(1989). Protein spots then were electroblotted (16 hr at 20 V) onto polyvinylidene difluoride membrane (0.2 μm from Bio-Rad), by using 50 mM Tris-HCl and 50 mM boric acid, pH 8.5, plus 0.01% SDS as transfer buffer. After matching the protein patterns with the reference gels by computer-aided image analysis, we excised the spots from the dried blots and stored them at −20°C for Edman sequencing. To analyze the purity of the thylakoid preparations, we conducted a protein gel blot analysis with different polyclonal antisera, according to standard procedures, using chemoluminescence for detection.

Database Searching

For the more abundant spots (>25 kD), usually >25 peptide masses were obtained by MALDI-TOF, and a very good coverage of the full-length proteins was typically found (30 to 60%; Table 1) within the specified 50-ppm mass accuracy. The more abundant ions in the MALDI spectra were used directly for database searches using the software MS-Fit, developed at the University of California at San Francisco MS Facility (http://prospector.ucsf.edu), to match known proteins or translated open reading frames in databases at the National Center for Biotechnology Information (NCBI) and SWISS-Prot. Database searches with MS-Fit were set up and performed on the basis of accumulating experience as well as suggestions from Parker et al. (1998) and were performed as follows: (1) In the first round of database searching, the maximal molecular mass was restricted to 120 kD to avoid hits of polyproteins or very large proteins, and no miscleavage was allowed. Mass accuracy was set at 15 ppm (thus, for a 1-kD peptide, the maximum allowed difference between the measured and theoretical peptide masses was defined as 0.015 D), and minimally four matching peptides were required. Oxidation of methionines was allowed, and cysteins could be modified by carbamidomethylation. Contamination (i.e., from keratin, the matrix, and/or the instrument), trypsin fragments, and systematic reoccurring ions were removed from the data set. No strict molecular mass and pI filters were applied to find possible breakdown products or unexpected splicing or incorrect annotations and to account for the expected variable lengths of the presequences. (2) If a plant protein was found, the accuracy was set at 50 ppm and one missed cleavage was allowed. This step was added to permit the identification of all possible peptide masses derived from an individual protein. (3) In the third round, we investigated whether the gel spot contained a second protein, after eliminating all peptide masses matching to the protein identified in the first and second rounds. If at least eight peptides remained, another search with MS-Fit was conducted.

A number of protein spots with uncertain identities were selected and analyzed by nano-ESI-MS/MS to yield fragment ion tag data. Searches with MS-Tag (http://prospector.ucsf.edu) were performed in nonerror mode, using the following values: all species; protein molecular mass range of 5 to 250 kD; precursor ion mass tolerance of 1 D; allowed fragment ion types of a, b, y, a-NH3, b-NH3, y-NH3, b-H2O, and internal ions; and trypsin digest (only one missed cleavage allowed). Alternatively, sequence tags were interpreted from the ESI-MS/MS spectra manually and were used in FASTA to search through the protein and genome databases (PIR, Atdb, SWISS-Prot, and NCBI). Database searching with N-terminal sequence tags from Edman degradation sequencing were also performed with FASTA. Analysis of hypothetical proteins was conducted using software at servers accessible on the Internet (Blast, Pfam, Prosite, Blocks, Prints, Prodom, and Proclass).

Predictions for chloroplast localization and chloroplast and lumenal transit peptides were made using the software programs PSORT (http://psort.nibb.ac.jp:8800/), ChloroP (http://www.cbs.dtu.dk/services/ ChloroP/), and SignalP (http://www.cbs.dtu.dk/services/SignalP/).

Miscellaneous

Protein determination was conducted according to Bradford (1976). Chlorophyll concentrations were spectroscopically determined in 80% acetone (Porra et al., 1989).

Acknowledgments

Mass spectrometry was conducted at the Department of Bioanalytical Chemistry at AstraZeneca R & D Mölndal (MALDI-TOF and ESI-MS/MS) and at the Department of Molecular Biology, Odense University (ESI-MS/MS). We thank Ann-Christine Nyström for performing the MS/MS analysis at AstraZeneca and Helena Brockenuus von Löwenhielm for her advice regarding MALDI-TOF analysis. Dr. Per-Ingvar Ohlsson at Umeå University is gratefully acknowledged for his excellent Edman sequencing analysis. We thank Drs. Thierry Rabilloud and Véronique Santoni for their advice and stimulating discussions, Jimmy Ytterberg for critically reading the manuscript, and Olof Emanuelsson and Jacob Halaska for their help with the logoplot analysis and discussions. M.-Amin Bakali Haraiki is acknowledged for his help with database searching and organizing the MS data in the initial stage of this study.

This study was supported by a postdoctoral fellowship to J.-B.P. from the Wenner-Grenska Samfundet; by the Nordisk Kontaktorgan för Jordsbrukforskning and the Swedish Foundation for Strategic Research (SSF), which provided general support to K.J.v.W.; and by the Swedish National Research Council, which provided financial support for the purchase of two-dimensional electrophoresis equipment to K.J.v.W. P.R. and D.E.K. are members of the Center for Experimental Bioinformatics, which is sponsored by the Danish National Research Foundation, and D.E.K. was also supported by a grant from the Brazilian Postgraduate Federal Agency.

References


Articles from The Plant Cell are provided here courtesy of Oxford University Press