Ariadne: a database search engine for identification and chemical analysis of RNA using tandem mass spectrometry data - PubMed (original) (raw)

Ariadne: a database search engine for identification and chemical analysis of RNA using tandem mass spectrometry data

Hiroshi Nakayama et al. Nucleic Acids Res. 2009 Apr.

Abstract

We present here a method to correlate tandem mass spectra of sample RNA nucleolytic fragments with an RNA nucleotide sequence in a DNA/RNA sequence database, thereby allowing tandem mass spectrometry (MS/MS)-based identification of RNA in biological samples. Ariadne, a unique web-based database search engine, identifies RNA by two probability-based evaluation steps of MS/MS data. In the first step, the software evaluates the matches between the masses of product ions generated by MS/MS of an RNase digest of sample RNA and those calculated from a candidate nucleotide sequence in a DNA/RNA sequence database, which then predicts the nucleotide sequences of these RNase fragments. In the second step, the candidate sequences are mapped for all RNA entries in the database, and each entry is scored for a function of occurrences of the candidate sequences to identify a particular RNA. Ariadne can also predict post-transcriptional modifications of RNA, such as methylation of nucleotide bases and/or ribose, by estimating mass shifts from the theoretical mass values. The method was validated with MS/MS data of RNase T1 digests of in vitro transcripts. It was applied successfully to identify an unknown RNA component in a tRNA mixture and to analyze post-transcriptional modification in yeast tRNA(Phe-1).

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Schematic diagram of the Ariadne database search program. Ariadne evaluates tandem MS data of nucleolytic fragments of RNA using a unique two-step algorithm, ‘MS/MS ion search’ and ‘nucleotide mapping’. See text for details.

Figure 2.

Figure 2.

Low-energy CID pattern of deprotonated ion of RNA. The fragmentation pattern is illustrated by an oligoribonucleotide with a sequence 5′-OH-CUAG-cyclic-phosphate-3′. The nomenclature of sequence ions is in accordance with McLuckey et al. (26). The structures of c/y ions are according to Tromp and Schuerch (37). The tentative structures of internal fragment ions produced by the double-backbone cleavage are designated as i(AU) and i(AU+p) for the structural variants shown in the figure. In this example, i(AU) contains two isomers that cannot be discriminated according to their mass.

Figure 3.

Figure 3.

Classification of MS/MS ion search results of RNase T1-digested xCyPA. The MS/MS data were searched by Ariadne against a ‘database’ consisting only of xCyPA mRNA sequence (xCyPA) or a ‘database’ consisting of human refseq and xCyPA mRNA (merged), and each result was classified as a TP, FP type 1 (FP1) or type 2 (FP2), FN or TN. (a) A Venn diagram of the classification, and (b) the classified search results. Details are provided in the text. Number in parenthesis in (b) indicates the number of unique sequences.

Figure 4.

Figure 4.

Mapping score histogram of the xCyPA mRNA search results. Scores for all entries in the database are summarized in the histogram. Frequencies of entries within a 10-point scoring range were counted, converted to common logarithm of frequency +1 and plotted. Note that the histogram shows a ‘hit’ for the query, as indicated by a distinctly high score.

Figure 5.

Figure 5.

Nucleotide sequences of the mature tRNAPhe-1 (Mature tRNA) and the equivalent transcript identified in the yeast tRNA database (Database). The identified RNase T1 fragments are underlined in the upper half of the figure and are listed in the lower half. Lower case letters indicate the methylated forms of the corresponding unmodified nucleotides (e.g. g for methylated G), and ‘D’ indicates dihydrouridine. Other symbols for modified nucleotides are according to Limbach et al. (47): ‘L’, 2-methylguanosine; ‘D’, dihydrouridine; ‘R’, _N_2,_N_2-dimethylguanosine; ‘B’, 2′-_O_-methylcytidine; ‘#’, 2′-_O_-methylguanosine; ‘Y’, wybutosine; ‘P’, pseudouridine; ‘?’, 5-methylcytidine; ‘7’, 7-methylguanosine; ‘T’, 5-methyluridine; and ‘"’, 1-methyladenosine. The boxed sequence indicates an intron.

Figure 6.

Figure 6.

Identification of post-transcriptional modification in yeast tRNAPhe-1 by MS/MS ion search. A typical search result obtained by the analysis of the doubly charged ion with m/z 1282.7, identified as the fragment 5′-OH-mAUCCACAG-2′,3′-cyclic phosphate-3′ spanning nucleotides 76−83 in the tRNAPhe-1 sequence, is shown. The product ions are assigned as indicated in the MS/MS spectrum, and the masses of each ion are underlined in the table. Of 470 total signals detected, 15 hits of the most intense 38 signals gave the highest score. The result was visualized by Ariadne.

Figure 7.

Figure 7.

Characterization of an unknown small RNA in yeast tRNA preparation. (a) Isolation of an unknown small RNA by anion-exchange chromatography. A mixture of yeast tRNA (tRNA typeX, Sigma R9001; 100 μg) was applied to a TSKgel DNA-NPR column (4.6 × 75 mm; Toso) and eluted with an 80-min gradient of NH4Cl (0.1–1.0 M) in 25 mM Tris–HCl buffer (pH 9.0) at a flow rate of 0.5 ml/min at 60°C. The fraction indicated by a double-headed arrow was collected, purified by reversed-phase chromatography (data not shown) and subjected to the LC–MS analysis after digestion with RNase T1 (see Materials and Methods section). (b) Base peak chromatogram of the RNase T1 digest of the small RNA.

Figure 8.

Figure 8.

Score histogram of the search results for an unknown small RNA. Scores for all sub-entries in the yeast genome database are summarized in the histogram. Frequencies of entries within a 10-point scoring range were counted, converted to common logarithm of frequency +1 and plotted. Note that the histogram shows a ‘hit’ for the query.

Figure 9.

Figure 9.

Score distribution of the unknown RNA results using a search of the S. cerevisiae whole genome. Scores for all sub-entries in the yeast genome database were mapped on chromosomes I–XVI (Chr I–XVI) and on the mitochondrial genome (Mt). A bar scale in the figure indicates score as 50. Note that six genomic loci clustered on chromosome XII were identified as the unknown RNA with significantly high scores (boxed in the figure).

Figure 10.

Figure 10.

Nucleotide sequences of six chromosomal loci identified for the unknown small RNA. The genomic sequences are shown with their positions on yeast chromosome XII. The sequences found in common among all genomic regions are indicated in bold, and the sequences of RNase T1 fragments identified in the unknown RNA are underlined. A BLAST search of the sequence revealed that all of those chromosomal loci encode 5S rRNA.

References

    1. Hirota K, Miyoshi T, Kugou K, Hoffman CS, Shibata T, Ohta K. Stepwise chromatin remodelling by a cascade of transcription initiation of non-coding RNAs. Nature. 2008;456:130–134. - PubMed
    1. Wang X, Arai S, Song X, Reichart D, Du K, Pascual G, Tempst P, Rosenfeld MG, Glass CK, Kurokawa R. Induced ncRNAs allosterically modify RNA-binding proteins in cis to inhibit transcription. Nature. 2008;454:126–130. - PMC - PubMed
    1. Fischer SE, Butler MD, Pan Q, Ruvkun G. Trans-splicing in C. elegans generates the negative RNAi regulator ERI-6/7. Nature. 2008;455:491–496. - PMC - PubMed
    1. Zilberman D, Cao X, Jacobsen SE. ARGONAUTE4 control of locus-specific siRNA accumulation and DNA and histone methylation. Science. 2003;299:716–719. - PubMed
    1. Bryan RC. Transcription and processing of human microRNA precursors. Mol. Cell. 2004;16:861–865. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources