A compendium of RNA-binding motifs for decoding gene regulation - PubMed (original) (raw)

. 2013 Jul 11;499(7457):172-7.

doi: 10.1038/nature12311.

Hilal Kazan, Kate B Cook, Matthew T Weirauch, Hamed S Najafabadi, Xiao Li, Serge Gueroussov, Mihai Albu, Hong Zheng, Ally Yang, Hong Na, Manuel Irimia, Leah H Matzat, Ryan K Dale, Sarah A Smith, Christopher A Yarosh, Seth M Kelly, Behnam Nabet, Desirea Mecenas, Weimin Li, Rakesh S Laishram, Mei Qiao, Howard D Lipshitz, Fabio Piano, Anita H Corbett, Russ P Carstens, Brendan J Frey, Richard A Anderson, Kristen W Lynch, Luiz O F Penalva, Elissa P Lei, Andrew G Fraser, Benjamin J Blencowe, Quaid D Morris, Timothy R Hughes

Affiliations

A compendium of RNA-binding motifs for decoding gene regulation

Debashish Ray et al. Nature. 2013.

Abstract

RNA-binding proteins are key regulators of gene expression, yet only a small fraction have been functionally characterized. Here we report a systematic analysis of the RNA motifs recognized by RNA-binding proteins, encompassing 205 distinct genes from 24 diverse eukaryotes. The sequence specificities of RNA-binding proteins display deep evolutionary conservation, and the recognition preferences for a large fraction of metazoan RNA-binding proteins can thus be inferred from their RNA-binding domain sequence. The motifs that we identify in vitro correlate well with in vivo RNA-binding data. Moreover, we can associate them with distinct functional roles in diverse types of post-transcriptional regulation, enabling new insights into the functions of RNA-binding proteins both in normal physiology and in human disease. These data provide an unprecedented overview of RNA-binding proteins and their targets, and constitute an invaluable resource for determining post-transcriptional regulatory mechanisms in eukaryotes.

PubMed Disclaimer

Figures

Figure 1

Figure 1. RNAcompete data for 207 RBPs

a, 7-mer Z scores and motifs for the two probe sets for ZC3H10. b, Two-dimensional hierarchical clustering analysis (Pearson correlation, average linkage) of E scores for 7-mers with E ≥ 0.4 in at least one experiment, with the two halves of the array kept as separate rows. Long systematic names have been shortened to species abbreviations and RNAcompete assay numbers. c, ROC curves showing discrimination of bound and unbound RNAs by the corresponding protein in vivo. The curve with the highest AUROC is shown if there are multiple in vivo data sets for a protein. FUS and TAF15 were excluded.

Figure 2

Figure 2. Motifs obtained by RNAcompete for RRM (outer ring) and KH domain proteins (inner ring)

The dendrograms represent complete linkage hierarchical clustering of RBPs by amino acid sequence identity in their RBDs. Line colours indicate species of origin of each protein, and shading indicates clades in which all sequences are more than 70% (dark) or 50% (light) identical.

Figure 3

Figure 3. RBD sequence identity enables inference of RNA motifs

a, Motif similarity versus per cent amino acid sequence identity in all RBDs for pairs of proteins. Motif similarity scored using STAMP Pearson-based log10(E value), correlation between PFM affinity scores against 10,000 random-sequence 100-mers, or human 3′ UTRs (for human RBPs). Columns indicate average; error bars indicate standard deviation. Red points: new proteins analysed (see c). b, Stacked bars indicate proportion of each category of RBP encompassed by experimentally determined motifs or inferred motifs using stringent (RNAcompete motifs, ≥70% identity) or expanded criteria (RNAcompete and literature motifs, ≥50% identity) in 288 eukaryotes (Supplementary Data 9). ‘Multi-RBD’ and “All” indicate proteins with >1 or >0 RBDs, respectively. c, Validation of motifs predicted for proteins at 61–96% amino acid identity (red text indicates validation motifs).

Figure 4

Figure 4. Conservation of motif matches in human RNA regulatory regions

a, Heat map showing conservation in 50-nucleotide bins (columns) in regions indicated at the top of the panel. Rows represent the most significant motif for indicated protein family (see Supplementary Table 4). Box fill: conservation score of the most conserved position in the motif for each bin. Border colour: conservation score when the entire regulatory region is considered as a single bin. Asterisks indicate known splicing factors. b, Alignment of vertebrate sequences over the ESRP1/2 site in the USF1 3′ UTR. Sequence logos are shown for major branches of vertebrate taxonomy. Dashed box: motif derived from the full alignment. The RNAcompete motif for ESRP1/2 is shown to the right.

Figure 5

Figure 5. RBFOX1 is a putative regulator of RNA stability in autism

a, Significance (as rank-sum Z score) of bias that RBP motifs in 3′ UTRs of mRNAs confer towards correlated expression with the RBP’s mRNA (FDR <0.1). b, Scatter plot shows Z score (from a) versus rank-sum Z score of the same target set, with mRNAs ranked instead by decay rate in MDA-MB-231 cells, for expressed RBPs. c, Enrichment of predicted RBFOX1 stability targets (by ‘leading-edge’ analysis) among transcripts with conserved RBFOX1 motifs. d, Density plot showing that RBFOX1 targets are enriched among transcripts most affected by RBFOX1 RNAi. e, Relationship of mRNA expression levels in autism spectrum disorder brains to RBFOX1 expression and predicted RBFOX1 target status.

Similar articles

Cited by

References

    1. Glisovic T, Bachorik JL, Yong J, Dreyfuss G. RNA-binding proteins and posttranscriptional gene regulation. FEBS Lett. 2008;582:1977–1986. - PMC - PubMed
    1. Keene JD. RNA regulons: coordination of post-transcriptional events. Nature Rev. Genet. 2007;8:533–543. - PubMed
    1. Cook KB, Kazan H, Zuberi K, Morris Q, Hughes TR. RBPDB: a database of RNA-binding specificities. Nucleic Acids Res. 2011;39:D301–D308. - PMC - PubMed
    1. Gabut M, Chaudhry S, Blencowe BJ. SnapShot: The splicing regulatory machinery. Cell. 2008;133:192.e1. - PubMed
    1. Auweter SD, Oberstrass FC, Allain FH. Sequence-specific binding of single-stranded RNA: is there a code for recognition? Nucleic Acids Res. 2006;34:4943–4959. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources