A paired-end sequencing strategy to map the complex landscape of transcription initiation - PubMed (original) (raw)

A paired-end sequencing strategy to map the complex landscape of transcription initiation

Ting Ni et al. Nat Methods. 2010 Jul.

Abstract

Recent studies using high-throughput sequencing protocols have uncovered the complexity of mammalian transcription by RNA polymerase II, helping to define several initiation patterns in which transcription start sites (TSSs) cluster in both narrow and broad genomic windows. Here we describe a paired-end sequencing strategy, which enables more robust mapping and characterization of capped transcripts. We used this strategy to explore the transcription initiation landscape in the Drosophila melanogaster embryo. Extending the previous findings in mammals, we found that fly promoters exhibited distinct initiation patterns, which were linked to specific promoter sequence motifs. Furthermore, we identified many 5' capped transcripts originating from coding exons; our analyses support that they are unlikely the result of alternative TSSs, but rather the product of post-transcriptional modifications. We demonstrated paired-end TSS analysis to be a powerful method to uncover the transcriptional complexity of eukaryotic genomes.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Paired-End Analysis of Transcriptional start sites (PEAT)

(a) Schematic outline of the PEAT strategy. The RNA fragment is shown as an arrowed line (red), the two Mme I sites induced at the oligo-capping and reverse transcription (RT) steps are shown in green and purple, respectively. (b) Mapping efficiency of the reads that have built-in linker sequences, combined from two technical replicates. (c) The distribution of uniquely mapped 5′ and 3′ reads relative to known TSSs and other genomic regions. (d) Comparison between PEAT and microarray expression data. 10,101 genes were plotted that had at least 1 mapped read-pair and were included in the microarray data. For the array data, expression level is the mean of simple background subtraction values across 3 replicates from mixed stage 0-11 D. melanogaster embryos. To estimate the expression level using paired-end sequencing data, we used the counts of 3′ tags that map to a transcribed region. Correlation coefficient was determined by Pearson correlation.

Figure 2

Figure 2. TSS clusters and initiation patterns identified in the Drosophila embryo

(a) The approach for identifying TSS clusters. A representative example (Chr. 2: 14516000-14516600) is shown. In essence, a smoothed density estimate of 5′ TSS tags was computed (blue line). Cluster boundary was then determined as exceeding a baseline score, estimated on a genomic background (red line). TSS clusters were further condensed to the shortest distance containing 95% of the reads (dark shaded area). (b) The genomic locations of all clusters that contain ≥ 100 reads. Clusters overlapping an annotated TSS in FlyBase were classified as FlyBase TSS. For the remaining clusters, classifications were based on the mode of each given cluster and its relative location to annotated transcripts. (c) Size distribution of all clusters with ≥ 100 reads. Cluster sizes are similar to previous reports for mammals, with the majority of clusters shorter than 120nt in length. (d) Definition of initiation patterns.

Figure 3

Figure 3. Promoter motifs associated with distinct promoter types

(a) The three initiation patterns, NP, BP and WP, are each represented by a candidate locus. The graphs show the relative percentage of 5′ reads that are mapped within a 100nt window. (b) Sequence landscape in the promoter region of each pattern. The mode location of each cluster is set as reference point ‘+1’. Sequence logos of 100-nt window are shown. (c) The core promoter motifs overrepresented for each initiation pattern. Significant motifs were identified in 200nt core promoter sequences and binned into 5nt intervals; only the 100nt region surrounding the TSS is shown as no motifs were found to be enriched outside of this window. All bins with normalized motif occurrences of 5-fold enriched or above are shown. The percent of sequences containing at least one high-stringency instance of each motif in its preferred location is listed on the left side of the heat map.

Figure 4

Figure 4. A distinct sequence motif identified for internally capped transcripts

(a-b) The gene structures of the PROD and RNPS1 loci indicating exons (thick bar) and introns (thin bar) from FlyBase are shown. A thick grey bar represents the UTR region. Grey areas highlight read clusters (≥ 100 reads/cluster). Green arrows denote primer locations for RT-PCR validation. A junction primer, which spans the linker and 5′ gene specific sequence at the cluster mode, together with a downstream primer (100-200 bp distance) were used to carry out RT-PCR. For each locus, cDNAs derived from RNA samples with (+) or without (−) linker ligation were used as template. The DNA ladder (M) is shown in the left lane. Sanger sequencing results show the correct position of the mode of the called TSS cluster for (a) a capped 5′ read cluster in the middle of a coding region; and (b) an example of a capped 5′ read cluster near the end of the coding region. (c) Sequence logo of a 100 nt window around the mode location (identified as ‘+1ߣ) of all clusters containing more than 100 reads and mapping to a coding region.

Similar articles

Cited by

References

    1. Juven-Gershon T, Kadonaga JT. Regulation of gene expression via the core promoter and the basal transcriptional machinery. Dev. Biol. 2010;339:225–229. - PMC - PubMed
    1. Ohler U, Wassarman DA. Promoting developing transcription. Development. 2010;137:15–26. - PMC - PubMed
    1. Butler JE, Kadonaga JT. Enhancer-promoter specificity mediated by DPE or TATA core promoter motifs. Genes Dev. 2001;15:2515–2519. - PMC - PubMed
    1. Hochheimer A, Zhou S, Zheng S, Holmes MC, Tjian R. TRF2 associates with DREF and directs promoter-selective gene expression in Drosophila. Nature. 2002;420:439–445. - PubMed
    1. Holmes MC, Tjian R. Promoter-selective properties of the TBP-related factor TRF1. Science. 2000;288:867–870. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources