Evolutionary conservation of sequence and secondary structures in CRISPR repeats - PubMed (original) (raw)

Evolutionary conservation of sequence and secondary structures in CRISPR repeats

Victor Kunin et al. Genome Biol. 2007.

Abstract

Background: Clustered regularly interspaced short palindromic repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in approximately 40% of bacterial and most archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CASs), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been recently shown that CRISPR provides acquired resistance against viruses in prokaryotes.

Results: Here we analyze CRISPR repeats identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. Some of the clusters present stable, highly conserved RNA secondary structures, while others lack detectable structures. Stable secondary structures exhibit multiple compensatory base changes in the stem region, indicating evolutionary and functional conservation.

Conclusion: We show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification, including specific relationships between CRISPR and CAS subtypes.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Distributions of folding scores of (a) all CRISPR repeats and all spacers, as compared to random sequences and (b) individual repeat clusters. X-axis, negative folding scores; Y-axis, fraction (percent) of total.

Figure 2

Figure 2

Evidence for secondary structure in cluster 3. (a) Multiple alignment of a subset (for clarity) of repeats in cluster 3. Numbers 1 to 7 and 7 to 1 indicate the residues involved in stem base-pairing, some compensatory mutations in the stem are highlighted with circles. Note G:U base pairing at position 5 in Xanthomonas oryzae and relaxed conservation of loop residues typical of RNA secondary structure in which the structure is functional rather than the sequence. (b) Sequence logo for all repeats in cluster 3. (c) Predicted secondary structure of Syntrophus acidotrophicus repeat using RNAfold. Stem positions are numbered in accordance with the alignment.

Figure 3

Figure 3

The sequence similarity space of CRISPR repeats visualized with the BioLayout (Java) program [26]. Dots denote individual repeat sequences; connecting lines represent Smith-Waterman similarities, such that closer dots represent more similar sequences. Dot colors denote cluster association as derived from MCL clustering. The 12 largest clusters are indicated by circles together with their sequence logos, coarse phylogenetic composition, and sample secondary structures where applicable.

Similar articles

Cited by

References

    1. Mojica FJ, Ferrer C, Juez G, Rodriguez-Valera F. Long stretches of short tandem repeats are present in the largest replicons of the Archaea Haloferax mediterranei and Haloferax volcanii and could be involved in replicon partitioning. Mol Microbiol. 1995;17:85–93. doi: 10.1111/j.1365-2958.1995.mmi_17010085.x. - DOI - PubMed
    1. Jansen R, Embden JD, Gaastra W, Schouls LM. Identification of genes that are associated with DNA repeats in prokaryotes. Mol Microbiol. 2002;43:1565–1575. doi: 10.1046/j.1365-2958.2002.02839.x. - DOI - PubMed
    1. Pourcel C, Salvignol G, Vergnaud G. CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. Microbiology. 2005;151:653–663. doi: 10.1099/mic.0.27437-0. - DOI - PubMed
    1. Bolotin A, Quinquis B, Renault P, Sorokin A, Ehrlich SD, Kulakauskas S, Lapidus A, Goltsman E, Mazur M, Pusch GD, et al. Complete sequence and comparative genome analysis of the dairy bacterium Streptococcus thermophilus. Nat Biotechnol. 2004;22:1554–1558. doi: 10.1038/nbt1034. - DOI - PMC - PubMed
    1. Haft DH, Selengut J, Mongodin EF, Nelson KE. A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput Biol. 2005;1:e60. doi: 10.1371/journal.pcbi.0010060. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources