Automated comparative sequence analysis by base-specific cleavage and mass spectrometry for nucleic acid-based microbial typing (original) (raw)

Proc Natl Acad Sci U S A. 2007 Jun 19; 104(25): 10649–10654.

Christiane Honisch

*SEQUENOM, Inc., 3595 John Hopkins Court, San Diego, CA 92121; and

Yong Chen

*SEQUENOM, Inc., 3595 John Hopkins Court, San Diego, CA 92121; and

Chloe Mortimer

‡Special and Reference Microbiology Division, Health Protection Agency, 61 Colindale Avenue, London NW9 5HT, United Kingdom

Catherine Arnold

‡Special and Reference Microbiology Division, Health Protection Agency, 61 Colindale Avenue, London NW9 5HT, United Kingdom

Oliver Schmidt

‡Special and Reference Microbiology Division, Health Protection Agency, 61 Colindale Avenue, London NW9 5HT, United Kingdom

Dirk van den Boom

*SEQUENOM, Inc., 3595 John Hopkins Court, San Diego, CA 92121; and

Charles R. Cantor

*SEQUENOM, Inc., 3595 John Hopkins Court, San Diego, CA 92121; and

Haroun N. Shah

‡Special and Reference Microbiology Division, Health Protection Agency, 61 Colindale Avenue, London NW9 5HT, United Kingdom

Saheer E. Gharbia

‡Special and Reference Microbiology Division, Health Protection Agency, 61 Colindale Avenue, London NW9 5HT, United Kingdom

*SEQUENOM, Inc., 3595 John Hopkins Court, San Diego, CA 92121; and

‡Special and Reference Microbiology Division, Health Protection Agency, 61 Colindale Avenue, London NW9 5HT, United Kingdom

Contributed by Charles R. Cantor, May 4, 2007

Author contributions: C.H., C.A., H.N.S., and S.E.G. designed research; C.H., C.M., and O.S. performed research; Y.C. contributed analytic tools; C.H. analyzed data; and C.H., C.A., D.v.d.B., C.R.C., H.N.S., and S.E.G. wrote the paper.

Freely available online through the PNAS open access option.

Supplementary Materials

Supporting Information

GUID: 20D5FAF5-44FE-4816-A468-7E1B306B2B24

GUID: 28CF5717-7E2E-490B-9BAD-FAEF82645046

GUID: FD0927B0-B58B-4461-B6BF-18205C57C687

GUID: 148E767C-B558-40C7-BA01-8212D08662ED

GUID: D20107D1-AAB2-4237-A1F7-0E385CF9AC6A

Abstract

Traditional microbial typing technologies for the characterization of pathogenic microorganisms and monitoring of their global spread are often difficult to standardize and poorly portable, and they lack sufficient ease of use, throughput, and automation. To overcome these problems, we introduce the use of comparative sequencing by MALDI-TOF MS for automated high-throughput microbial DNA sequence analysis. Data derived from the public multilocus sequence typing (MLST) database (http://pubmlst.org/neisseria) established a reference set of expected peak patterns. A model pathogen, Neisseria meningitidis, was used to validate the technology and explore its applicability as an alternative to dideoxy sequencing. One hundred N. meningitidis samples were typed by comparing MALDI-TOF MS fingerprints of the standard MLST loci to reference sequences available in the public MLST database. Identification results can be obtained in 2 working days. Results were in concordance with classical dideoxy sequencing with 98% correct automatic identification. Sequence types (STs) of 89 samples were represented in the database, seven samples revealed new STs, including three new alleles, and four samples contained mixed populations of multiple STs. The approach shows interlaboratory reproducibility and allows for the exchange of mass spectrometric fingerprints to study the geographic spread of epidemic N. meningitidis strains or other microbes of clinical importance.

Keywords: MALDI-TOF MS, microbial identification, multilocus sequence typing, Neisseria meningitidis

The accurate characterization of infectious disease agents is essential to epidemiological surveillance and public health decisions. This includes outbreak recognition, the detection of cross-transmission of pathogens, the determination of the source of infection, recognition of particularly virulent strains, and monitoring of vaccination programs. Although phenotypic characters such as morphology and physiological properties have traditionally been used to characterize microbes, nucleic acid analysis technologies paved the way for modern typing approaches. Phenotypic markers are subject to genetic regulation and respond to environmental stimuli like culture, subculture, and storage conditions, whereas suitable nucleic acid-based characterization methods deliver a stable fingerprint of the sample important for global comparability and phylogenetic analysis.

Recently, microbial DNA-based identification and typing have significantly increased. Applications are often high-throughput in nature, and appropriate typing methods require accuracy, reproducibility, and laboratory automation (1).

The current most common nucleic acid-based tools are based on gel electrophoresis or fingerprinting and rely on electrophoretic mobility. Pulse-field gel electrophoresis is the most widely used method. Standardized protocols and reference databases have been established worldwide, but, as for classic fingerprinting, technical problems such as ambiguous bands, background noise of the electrophoretic profile, and distortion between gels remain. Digital formats of the results and data portability are challenging. Processing times of up to 3 days reduce the ability to analyze large numbers of samples (2).

New technologies for whole-genome comparative sequencing like whole-genome DNA microarrays are still prohibitively expensive and lack ease of use. Future utility requires the reduction of costs per reaction, robust and simplified formats focused on established regions of genetic variance, and an adequate evaluation in comparison with other molecular methods. Ambiguities in the interpretation of the ratios of hybridization and cross-hybridization to paralogous genes are important limitations of the technique. In addition, PCR product microarrays generally do not have the resolution to detect minor deletions and point mutations (3). Nucleotide composition analysis of short PCR products by electrospray MS has been described, where the detected mass of the product is used to determine a constrained list of nucleotide compositions for identification (4). Sequence variations can be detected but not localized or converted to a new sequence. Therefore, typing methods based on PCR-amplified DNA marker regions (e.g., 500–800 bp in length) and nucleotide sequence analysis like dideoxy sequencing or comparative sequence analysis by MALDI-TOF MS are important alternatives. Probing large collections of microbial isolates using a partial genetic signature provides the framework for these sequence-based typing approaches (5). PCR techniques make the analysis of molecular marker regions easily achievable even for trace amounts of material, uncultured species, or clinical samples. The resulting DNA sequences allow for the construction of electronically accessible genetic databases suitable for prospective epidemiologic surveillance efforts and data transfer between centers (6).

Multilocus sequence typing (MLST) was introduced in 1998 as a comparative sequence-based typing approach to assess the population structure of bacterial isolates. MLST elucidates genomic relatedness at the inter- and intraspecies level using PCR and dideoxy sequencing of a restricted number of housekeeping genes. The use of multiple loci is essential to achieve the resolution required to provide meaningful relationships among strains (7 –9). Data can easily be compared with those in a large central database via the Internet. The continuously expanding MLST database currently covers 18 species.

MLST by MALDI-TOF MS is an alternative automated comparative sequencing method. PCR-amplified signature sequences are subject to in vitro transcription and base-specific RNA cleavage (10). Mass signal patterns of the resulting cleavage products, a mixture of RNA fragments, so-called compomers, are acquired and provide a fingerprint of the sample. Each RNA compomer is defined by its nucleotide composition with the cleavage base terminating its 3′ end and thus by its mass in the resulting mass spectrum. The list of detected experimental compomer masses is compared with a calculated list of molecular weights derived from an in silico digest of a set of reference sequences in the system database. These simulated patterns of the reference set are used to identify the sample by its best matching reference sequence. Microheterogeneities between the best matching reference and the sample sequence, such as single base deviations, show up as a deviation between the in silico and the detected sample spectrum. They can be used to identify and localize sequence differences down to single base pair change (11) and identify novel sequences.

MALDI-TOF MS sequence-based typing for high-level discrimination of individual microbial taxa based on signatures within variable regions in the 16S rDNA gene region has previously been applied to discriminate mycobacteria and Bordetella species (12, 13).

In contrast to 16S rDNA-based typing, MLST is based on characterizing variations in the sequence of several loci, which are accumulating slowly within a microbial population. MLST thus requires differentiation of reference sequences based on single nucleotide deviations, which provides a new challenge for the comparative sequencing approach by MALDI-TOF MS. Here we demonstrate the power of MALDI-TOF MS-based MLST to identify lineages of the bacterial pathogen Neisseria meningitidis.

Results

N. meningitidis often causes severe meningococcal meningitidis and septicemia, most frequently in young children. Epidemic outbreaks of varying scale, up to global pandemics, require intricate genetic typing to identify case clusters. MLST is the most powerful and, simultaneously, the most portable approach to keep track of the epidemic spread and has identified clones with apparent increased virulence (14, 15). It can now be considered the gold standard for genotyping N. meningitidis.

MLST of N. meningitidis summarizes the nature of sequence variations detected in 450- to 500-bp sequences of internal regions of seven housekeeping genes (abcZ, adk, aroE, fumC, gdh, pdhC, and pgm). Different sequences present within the species are assigned as distinct alleles. For each sample, alleles at each of the seven loci are identified and define its allelic profile or sequence type (ST). Major clonal complexes group STs differing in only one or two alleles (16, 17).

MLST by Base-Specific Cleavage and MALDI-TOF MS.

To evaluate automated microbial typing by MALDI-TOF MS, MLST of N. meningitidis was performed as a blind study of 100 isolates in reference to the http://pubmlst.org/neisseria database. The database contains a collection of isolates that represent the total known diversity of N. meningitidis species, currently ≈5,300 different STs, with ongoing compilation.

Between 209 and 344 published alleles per locus served as reference sequence sets for MALDI-TOF MS-based typing. The concept of reference sequence-based peak pattern analysis is, however, applicable to nucleic acid-based typing and comparative sequence analysis of haploid organisms in general.

The four steps of automated MALDI-TOF MS-based typing are shown in Fig. 1. Reference sequence sets, including the gene-specific primer sequences, are imported into the system database to generate simulated peak patterns (Fig. 1, step 1). DNA sample processing follows the standard MLST protocol (http://pubmlst.org/neisseria) using the sequencing primer set to amplify the internal regions of the seven housekeeping genes. Each sequencing primer set is tagged with a T7 promoter sequence and a 10-nt tag resulting in two sets of PCR primers. PCR products of the T7-tagged forward primer and the T7-tagged reverse primer allow for in vitro transcription of the sense and antisense strands. The resulting two RNAs are subject to base-specific cleavage at C and U generating representative compomer mixtures for cleavage reactions equivalent to all four monobase-specific cleavages of a single strand (Fig. 1, step 2).

Procedural steps involved in microbial typing by MALDI-TOF MS. Step 1, import of reference sequences into the system database; step 2, PCR and post-PCR biochemistry; step 3, MALDI-TOF MS spectrum and peak pattern comparison; step 4, tabulated typing results.

Because this process uses PCR amplification it can theoretically be applied to clinical cases with minimal or no culturing of the infectious agent, which can be important both for rapidly progressing infections and for slow-growing or difficult to culture organisms such as mycobacteria.

For DNA samples the sensitivity can be as high as one genome equivalent present in the reaction vial (18). As proven in limited dilution experiments (data not shown) the amplification provided by PCR and transcription is sufficient to produce a measurable product.

Before MALDI-TOF MS measurements, samples are desalted by anion exchange resin treatment and dispensed on a matrix-coated chip (Fig. 1, step 3). Typing results and sequence deviations are automatically assigned by the Signature Sequence Identification software tool (SEQUENOM) (Fig. 1, step 4).

Of the 100 N. meningitidis isolates analyzed by base-specific cleavage and MALDI-TOF MS, 89 samples were automatically assigned to alleles and resulted in STs existing in the database. Three samples resulted in STs with new sequences for one of the alleles; an additional two STs were defined by known alleles but not listed in the database, and four samples revealed untypeable mixed populations. Alleles, STs, and clonal complexes of all samples are listed in supporting information (SI) Table 1. The 96 typeable samples represent 38 known STs of 11 clonal complexes and five new STs.

The concordance between MALDI-TOF MS and dideoxy sequencing-based MLST of the 96 × 7 = 672 typeable alleles was 98.9%, representing 665 identically identified alleles. Detailed analysis of the differences revealed that the gdh alleles of four samples were misidentified by the analysis software because of the failure of two transcription and cleavage reactions but were flagged for manual analysis and recovered by manual assignments of the best matching reference allele (user calls). Three new alleles, including an abcZ, an aroE, and a pdhC allele, in three different samples were identified by MALDI-TOF MS and confirmed by dideoxy sequencing. The sequences showed 99.4%, 99.8%, and 99.6% identity with their corresponding best matching database references _abcZ_285, _aroE_9, and _pdhC_207 corresponding to deviations of 3, 2, and 1 bp.

MLST MALDI-TOF MS data acquisition of the whole set of 100 samples was accomplished in a total of 4 h. Operator variables are mostly removed by liquid handling and automated data acquisition. Samples and loci can be processed in sets of 96 within 7 h or staggered to increase the throughput and provide sufficient speed to track an ongoing epidemic. The data acquisition and analysis of a complete set of seven loci per sample can be obtained on 28 matrix patches of a 384 chip in 2.5 min. One 384 chip allows for the analysis of the seven loci in 12 samples and a negative control. Considering the analysis of four cleavage reactions per locus and an average amplicon length of 500–800 bp, a single mass spectrometer with a data acquisition speed of 4.5 sec per reaction can scan ≈2 million bp per day, which compares favorably with standard dideoxy sequencing equipment (19).

Signature Sequence Identification Software Tool.

Data processing was performed with the Signature Sequence Identification software (SEQUENOM) specifically developed to analyze base-specific cleavage patterns in comparison to a given set of reference sequences. In brief, the simulation module of the software performs in silico cleavage reactions for the imported set of reference sequences. The resulting simulated cleavage patterns are clustered based on their distinctive peak pattern in a way that resulting clusters can be uniquely identified and distinguished from one another. For N. meningitidis all sequences within the seven reference sequence sets were differentiable in this simulation. This demonstrates a comparable discriminatory power of MLST by MALDI-TOF MS with the dideoxy sequencing gold standard.

Spectra for four cleavage reactions per sample were acquired and mass recalibrated against a set of unique calibration peaks derived from the reference sequence set. In theory, samples can be identified by simply finding the best match of the detected peak pattern with the simulated pattern of a reference sequence set. However, because of various factors, such as intensity variations in the sample spectra, peak pattern matching requires additional scoring, particularly for large and often closely related reference sequence sets such as the one used in this study. Judgment of the peak pattern match is therefore a dynamic combination of three scores, the basic pattern matching score, a discriminating peak matching score, and a distance score. The discriminating peak matching score is calculated by evaluating only a subset of simulation-derived unique reference-specific identifier signals, whereas the distance score is determined based on Euclidian distances.

To increase the robustness, identification is performed by iteration. Initially, scores are calculated for all reference sequences, and a set of best matching reference sequences is selected. Detected peak patterns are reevaluated against this subset, and scores are recalculated to reevaluate the subset and to find an even smaller set of best matching sequences. This process continues until one sequence or several sequences with close scores that are considerably better then the rest are found for each of the samples. Finally, the top matching reference sequence is evaluated for potential mutations, and a confidence is assigned based on spectra quality and missing and additional signals, as well as unknown signals, which are inconsistent with any compomer or adduct assignment.

The typing statistics of the analysis software on the 96 typeable N. meningitidis samples are summarized in Fig. 2. For 97.6% of a total of 672 alleles the software automatically identified the correct top matching reference sequence in agreement with dideoxy sequencing. Among these, 0.4% were identified as new sequences extending the existing reference set. For 1.8% of the alleles the correct matching reference was listed among a group of homologous top matching references and typing required manual selection of the best match. This was mainly because of the failure of one of the four cleavage reactions. Only 0.6% of the alleles, four gdh alleles of a total of 672 alleles, were assigned to the wrong sequence but were correctly identified by user calls as described above.

MALDI-TOF MS MLST typing statistics of 96 typeable N. meningitidis samples. For 97.6% of the sample alleles the software automatically assigned the correct top matching reference sequence, for 1.8% the correct matching reference was listed among a group of top matching references with equal score, and for 0.6% a wrong reference sequence was presented.

Single Base Pair Mutation Detection.

New alleles were identified by a combination of the identification algorithm with a MALDI-TOF MS-specific SNP Discovery algorithm (11).

Fig. 3 shows the detection of a novel _aroE_9 modification with a C→T single base deviation at position 443. Band patterns derived from the reference sequence illustrate the difference between the in silico pattern of _aroE_9 and the detected sample pattern. The T-specific reaction of the forward RNA transcript (Fig. 3A) shows a missing signal at 8,957.9 Da in comparison to the reference pattern. The signal represents a cleavage product that is localized at position 439 of the amplicon with a composition A8C10G9U1. A new signal appears at 7,343.5 Da with a composition of A8C8G6U1. The deviation between the missing and the additional compomer can be explained by a substitution of a C with a T at position 443 and the introduction of a cleavage base at this position, which leads to the detected compomer at 7,343.5 Da and a compomer C1G3U1 at 1,650.0 Da (data not shown). The later is a silent noninformative signal because it is identical to two compomers of the same nucleotide composition derived from sequence stretches somewhere else in the reference. The T-cleavage reaction of the reverse RNA transcript confirms the observation (Fig. 3B). The corresponding compomer A1C5G3U1 at 3,136.0 Da is missing, while an additional signal at 3,120.0 Da with the composition A2C5G2U1 reflects the observed C to T change by the complementary event G to A. Additional confirmation is gained in the C-specific cleavage reaction of the forward RNA transcript from an additional signal at 2,010.0 Da of composition C1G4T1. The signal is the result of the loss of the C-cleavage site in compomer C1G3 at position 432 due to the C to T change. The corresponding missing signals of the two combined fragments are silent and below the mass range of detection. The C-specific cleavage reaction of the reverse RNA transcript does not add any additional information because the corresponding mass of the affected compomer GC is ≤1,000 Da and thus out of the mass range of detection. These low mass range signals provide no information content. They are the result of nucleic acid monomers, dimers, and trimers overlaid by matrix contamination and are therefore discarded. In conclusion, the C→T mismatch between the best matching reference sequence _aroE_9 and the sequence of the sample was robustly and redundantly detected by MALDI-TOF MS with two missing and three additional signals.

MALDI-TOF MS-based discovery of a mutation C→T in allele _aroE_9 at position 443. (A) Mutation-specific signal changes at 7,343.5 and 8,957.9 Da in the T-specific cleavage reaction of the forward RNA transcript. (B) Mutation-specific signal changes at 3,120.0 and 3,136.0 Da in the T-specific cleavage reaction of the reverse RNA transcript. (C) Mutation-specific signal change at 2,010.0 Da in the C-specific cleavage reaction of the forward RNA transcript.

The SNP Discovery algorithm also identified deviations in consensus sequence stretches, substituting missing sequence information between the MLST sequencing primer and the available reference sequences. Unlike standard dideoxy-sequencing based MLST, where the first 5–10 bp after the primer region are not resolved and the sequence reads require trimming before database query, base-specific cleavage and MALDI-TOF MS MLST analyzes the full-length transcript starting at the ggg-transcription start of the T7-polymerase. SNP Discovery results were confirmed by dideoxy sequencing and are available in SI Table 2. Identified sequence deviations showed 100% homology within the alleles and maintained discrimination between alleles.

Simulation.

A computational simulation systematically introduced all possible single-nucleotide mutations into each sequence of the MLST reference sequence sets and assessed the ability to detect these variations using four base-specific cleavage reactions and the SNP Discovery algorithm. Mass signals in a range of 1,100–8,000 Da were considered, and a mass resolution (m/Δm) of 600 was assumed (values routinely achieved by MALDI-TOF MS). The results summarized in SI Table 3 demonstrate that for the seven reference sequence sets of this study 99.0% of all possible single-nucleotide changes are detectable by base-specific cleavage and MALDI-TOF MS. Slightly higher detection rates are obtained for substitutions (99.4%), which are more likely to occur in typing of housekeeping gene regions like MLST, when compared with detection rates for deletions (98.9%) and insertions (98.7%). This can be explained by the fact that substitutions can lead to up to 10 observations (five missing and five additional signals), whereas insertions/deletions can lead to a maximum of nine observations in the sample spectra.

Cluster Analysis.

Detected mass signals from the four cleavage reactions can be used to characterize a defined fingerprint of a sample as a matrix of peak positions in combination with the intensities of the signals converted to integers. Relationships between different samples can be analyzed by Euclidean distance and displayed as a dendrogram. A list of spectra that contain similar fingerprints and thus similar peak positions and intensities is described as a cluster that displays similarities among the objects of the set without the need for the assignment of a known reference sequence. Cluster analysis of mass peak patterns allows for the rapid high-throughput analysis of large sample sets, when only limited numbers of reference sequences are available, as needed for the identification of new informative marker sets.

A cluster analysis using the unweighted pair group method on MALDI-TOF MS fingerprints for the four cleavage reactions of 15 fumC alleles from 89 samples is demonstrated in Fig. 4A. This dendrogram demonstrates equal resolution of the sample set in comparison to the dendrogram produced by direct comparison of the primary DNA sequences (Fig. 4B). A Euclidean distance of 2.8 was found to be the similarity cutoff for samples with 100% sequence identity. All samples grouped within their corresponding alleles. The groupings are maintained in both dendrograms. Spectral patterns and primary sequences of the alleles fell into two major groups of identical clades with alleles fumC 1, 5, 8, 9, 13, 15, 40, 55, and 60 in the first clade and alleles fumC 3, 4, 17, 26, 90, and 124 forming the second clade. A total of 13 partitions (above the similarity cutoff) per tree were obtained. Between the two trees nine of these partitions connected the same groups of samples, whereas four of the partitions were present in one tree but not in the other. We thus observe a similarity difference of eight between the two trees. Differences were found within the first group of clades, whereas there were no differences in the second.

Cluster analysis for the fumC allele. Shown is an unweighted pair group method analysis tree of base-specific cleavage and MALDI-TOF MS patterns (A) in comparison to an unweighted pair group method analysis tree derived from the primary sequences of the same sample set (B). Samples are labeled by allele and sample number (x_y). Alleles are color-coded, and partitions are numbered. Euclidean distance 2.8 is the cutoff for the degree of spectra similarity between identical samples. Clades defined by one tree but not by the other are highlighted by asterisks.

Overall cluster analysis of base-specific cleavage mass signal patterns shows clearly distinguishable clusters reflecting differences between alleles and their grouping by primary sequence analysis.

Reproducibility.

A random set of 23 samples representing 12 STs was chosen to assess the reproducibility of MALDI-TOF MS-based typing on two mass spectrometers at the collaborating centers. Samples were processed in four runs on different days according to the standard protocol. Data for three of the four runs were acquired at SEQUENOM, and data for one of the four runs were acquired at the Health Protection Agency. A total of 638 products were successfully amplified, transcribed, and cleaved. Six reactions failed PCR or post-PCR processing with four dropouts on the second day of processing and one dropout on days 3 and 4, leaving 99.1% of the data (638/644) for reproducibility analysis. Of these, 99.1% (632/638) were assigned to the correct allele. Six results were ambiguous and showed multiple matching alleles including the correct allele with the option for a correct user call. Among these, one sample was identified as a mixture of two abcZ alleles resulting in the assignment of both alleles for the four repeated data points.

Overall 98.1% (152/155) of the repeated typing events were reproducible. This reflects the stability of the molecular typing approach based on the specificity of the MALDI-TOF MS patterns.

Discussion

Reproducible large-scale monitoring of microbes, especially of human pathogens including virulent, emerging, and antibiotic-resistant strains, is increasingly important in today's world of global transport and requires technologies that are automated, less labor-intensive, and faster than traditional epidemiological typing methods.

The MALDI-TOF MS-based system enables automated reference sequence-based identification and characterization of DNA or RNA sequences and is suited to screen multiple loci in parallel as needed for MLST. The resulting digital data are both highly accurate and portable. Compared with traditional methods for analyzing PCR amplicons, including gel electrophoresis and dideoxy sequencing, MS combines 384-well liquid handling robotics for PCR and post-PCR processing with the mass accuracy and speed of a MALDI-TOF MS analyzer. Automated data analysis avoids time-consuming trace analysis and sequence alignments to deliver the best matching reference sequence and to identify and localize sequence variations. As opposed to dideoxy sequencing, band compression artifacts by repeats of single nucleotides in a sequence do not cause misreading of the sequence.

Validation of the system by processing and analysis of a stable set of MLST markers in 100 isolates of N. meningitidis has shown typeability, reproducibility, and concordance as well as a discriminatory power equal to standard dideoxy sequencing. Results for the full sample set were obtained in only 2 working days, with a data acquisition time of 8 h, a computational analysis time of 5–10 min on a standard desktop computer, and a correct automated identification rate of 98%. This level of speed, automation, and standardized scaleable sample processing will enable efficient outbreak monitoring. The technology is generic and has the ability to type any pathogen or microbe to the genus-, species-, strain-, or subtype-specific level with ease of use and data interpretation, provided that at least one reference sequence or a set of reference sequences are available. This is of importance as microbial genome sequencing projects constantly increase the availability of whole genome sequences for clinically relevant microorganisms and trigger the comparisons of selected signature sequences to develop improved diagnostic typing assays.

Maintaining databases for the molecular characterization of microbes is an ongoing process. New isolates will develop over time, and isolates may be absent or poorly represented in the database. The better the species is represented by the corresponding database, the fewer manual steps are involved in the analysis, which clearly emphasizes the value of the system for automated sample characterization in a diagnostic reference laboratory.

The stability of the reaction plates allows for their storage and shipment to a central MALDI-TOF MS facility. The approach enables the comparison of processed plates and the portability of MALDI-TOF MS data or identified sequences between different reference laboratories without exchanging strains.

Pilot studies to monitor the successful applications in multiple organisms are needed, and a reference sequence-independent approach is envisioned to enable the monitoring of individual and multiple species in a sample by comparison of their characteristic mass spectral fingerprints to archived reference spectra.

Once the procedure is implemented at clinical reference centers, it will provide a tool for automated epidemiological evaluation and disease control.

Materials and Methods

Bacterial Strains.

N. meningitidis isolates from various serogroups were supplied by the National Meningitidis Reference Laboratory (Manchester, U.K.) and by the National Collection of Type Cultures (London, U.K.). Refer to SI Materials and Methods for details on DNA isolation.

MLST by Dideoxy Sequencing.

The MLST scheme for N. meningitidis uses internal regions of seven housekeeping genes, abcZ (putative ABC transporter), adk (adenylate kinase), aroE (shikimate dehydrogenase), fumC (fumarate hydratase), gdh (glucose-6-phosphate dehydrogenase), _pdh_C (pyruvate dehydrogenase subunit), and pgm (phosphoglycomutase). Loci were PCR-amplified and sequenced on both strands as described for the standard MLST sequencing protocol (http://pubmlst.org/neisseria/mlst-info/nmeningitidis/nmeningitidis-info.shtml) using a Beckman Coulter (Fullerton, CA) CEQ automated sequencer according to the manufacturer's protocol.

MLST by MALDI-TOF MS: N. meningitidis Reference Sequence Sets.

Reference sequence sets were used as published (http://pubmlst.org/neisseria) to create import files for MALDI-TOF MS analysis. The sets were modified by the addition of the gene-specific primer regions of the forward primer as well as the reverse primer and a stretch of consensus sequence to fill the gap between the primer sequence and the trimmed published reference.

For aroE the corresponding sequence stretch of N. meningitidis serogroup B strain MC58 (GenBank accession no. NC_003112) was used, whereas the corresponding sequence region of the N. meningitidis serogroup A strain Z2491 (GenBank accession no. NC_003116) was used for the rest of the loci.

Amplicon Design.

Standard MLST sequencing primers were tagged with a T7-RNA promoter sequence as well as a unique 10-nt sequence tag (refer to SI Table 4 for details) and used for PCR.

PCR, Base-Specific Cleavage, and MALDI-TOF MS.

Samples were processed in parallel in 384 microtiter plates using a 96-channel automated pipettor (SEQUENOM). Loci of interest were amplified in 5-μl PCRs as described in SI Materials and Methods. Post-PCR processing was performed according to the standard MassCLEAVE protocol (SEQUENOM). Target regions were cleaved in four reactions at positions corresponding to each of the four bases. After transfer onto 384 SpectroCHIPs (SEQUENOM) samples were analyzed on a MALDI linear time of flight mass spectrometer (Compact Analyser; SEQUENOM). Further information is provided in SI Materials and Methods.

Signature Sequence Identification Software.

Data analysis was performed by using a proprietary software package (Signature Sequence Identification software, Prototype; SEQUENOM).

Cluster Analysis.

Cluster analysis by unweighted pair matching was performed by using PHYLIP (20).

Supplementary Material

Acknowledgments

We thank Rick Crawford, Brian Groff, Michael Mosko, and Mark Swaisgood for their significant contributions to the development of the technology.

Abbreviations

MLST	multilocus sequence typing
ST	sequence type.

Footnotes

Conflict of interest statement: As SEQUENOM employees and/or shareholders, C.H., Y.C., D.v.d.B., and C.R.C. declare a competing financial interest.

This article contains supporting information online at www.pnas.org/cgi/content/full/0704152104/DC1.

References

3. Garaizar J, Rementeria A, Porwollik S. FEMS Immunol Med Microbiol. 2006;47:178–189. [PubMed] [Google Scholar]

4. Sampath R, Hofstadler SA, Blyn LB, Eshoo MW, Hall TA, Massire C, Levene HM, Hannis JC, Harrell PM, Neuman B, et al. Emerg Infect Dis. 2005;11:373–379. [PMC free article] [PubMed] [Google Scholar]

6. Pfaller MA. Arch Pathol Lab Med. 1999;123:1007–1010. [PubMed] [Google Scholar]

7. Maiden MC, Bygraves JA, Feil E, Morelli G, Russell JE, Urwin R, Zhang Q, Zhou J, Zurth K, Caugant DA, et al. Proc Natl Acad Sci USA. 1998;95:3140–3145. [PMC free article] [PubMed] [Google Scholar]

8. Urwin R, Maiden MC. Trends Microbiol. 2003;11:479–487. [PubMed] [Google Scholar]

10. Stanssens P, Zabeau M, Meersseman G, Remes G, Gansemans Y, Storm N, Hartmer R, Honisch C, Rodi CP, Bocker S, van den Boom D. Genome Res. 2004;14:126–133. [PMC free article] [PubMed] [Google Scholar]

11. Bocker S. Bioinformatics. 2003;19(Suppl 1):i44–i53. [PubMed] [Google Scholar]

12. Lefmann M, Honisch C, Bocker S, Storm N, von Wintzingerode F, Schlotelburg C, Moter A, van den Boom D, Gobel UB. J Clin Microbiol. 2004;42:339–346. [PMC free article] [PubMed] [Google Scholar]

13. von Wintzingerode F, Bocker S, Schlotelburg C, Chiu NH, Storm N, Jurinke C, Cantor CR, Gobel UB, van den Boom D. Proc Natl Acad Sci USA. 2002;99:7039–7044. [PMC free article] [PubMed] [Google Scholar]

14. Feavers IM, Gray SJ, Urwin R, Russell JE, Bygraves JA, Kaczmarski EB, Maiden MC. J Clin Microbiol. 1999;37:3883–3887. [PMC free article] [PubMed] [Google Scholar]

15. Jolley KA, Kalmusova J, Feil EJ, Gupta S, Musilek M, Kriz P, Maiden MC. J Clin Microbiol. 2000;38:4492–4498. [PMC free article] [PubMed] [Google Scholar]

16. Murphy KM, O'Donnell KA, Higgins AB, O'Neill C, Cafferkey MT. Br J Biomed Sci. 2003;60:204–209. [PubMed] [Google Scholar]

17. Enright MC, Spratt BG. Trends Microbiol. 1999;7:482–487. [PubMed] [Google Scholar]

20. Felsenstein J. PHYLIP: Phylogeny Inference Package. Seattle: Univ of Washington; 1993. Version 3.6. [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences