Most “Dark Matter” Transcripts Are Associated With Known Genes (original) (raw)
Figure 1
Low precision for tiling arrays compared to RNA-Seq data.
(A) Precision-recall curves for detection of exons in human RefSeq gene annotations on tiling arrays. Transcribed genomic regions (transfrags) were selected based on a range of parameters that were applied before or after median smoothing with a bandwidth of 70 bp: max gap, the maximum distance between two positive probes; min run, the minimum size of a transcribed region. The log2 normalized intensity threshold used to select positive probes was varied between −1 and 2 to plot each line. (B) Precision-recall curves for the combined RNA-Seq data from three human brain samples, at different read depths (0.2 to 2.1 Gb). Transcribed regions (seqfrags) were identified on the basis of uniquely mapped reads, and the threshold for the minimal read count per seqfrag was varied between 1 and 100 to plot each line. (C) Comparison of RNA-Seq read counts and tiling array probe intensities for the pooled set of human brain RNA-Seq reads (three samples). The number of RNA-Seq reads overlapping each mapped probe coordinate was determined and used to draw a boxplot of the intensity distributions measured for probes overlapped by varying numbers of RNA-Seq reads, as indicated (gray boxes). The intensity distribution across all probes is shown in comparison (white box). Line graphs indicating the cumulative fraction of RNA-Seq read area (green) and read count (red) covered at each read coverage level are superimposed on the barplot, with the scale shown on the right. (D) Kernel-density plot of probe intensities for high- and low-coverage probe groups from (A), as indicated.