An analysis of the feasibility of short read sequencing - PubMed (original) (raw)

An analysis of the feasibility of short read sequencing

Nava Whiteford et al. Nucleic Acids Res. 2005.

Abstract

Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20-30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Percentage of unique reads as a function of read length for (a) λ-phage and (b) E.coli K12. The dashed curves show results for randomly generated sequences of the same size, which are content-biased to yield the same relative proportion of nucleotides given in ref. (27).

Figure 2

Figure 2

(a) Percentage of unique sub-sequences (U) for varying read length (l), the solid line shows uniqueness in the whole human genome, the dashed line shows uniqueness in human chromosome 1. (b) Percentage of human chromosome 1 covered by contigs greater than a threshold length as a function of read length. The horizontal axis starts at 18 nt, due to the limitations of reassembly below this length.

Figure 3

Figure 3

Percentage of the E.coli genome covered by contigs greater than a threshold length as a function of read length.

Figure 4

Figure 4

(a) Percentage of unique sub-sequences (U) for varying read length (l), in the C.elegans genome. (b) Percentage of the C.elegans genome covered by contigs greater than a threshold length as a function of read length. The horizontal axis starts at 18 nt, due to the limitations of reassembly below this length.

Figure 5

Figure 5

Reassembled contigs longer than 200 nt in the 81 090 nt of the BRCA1 gene. Reassembly was simulated from 25, 50 and 100 nt reads covering the whole of chromosome 17. Reassembled contigs are shown in alternating black and grey. Contigs maybe next to each other, or overlapping slightly without an unambiguous overlap existing between the contigs.

Similar articles

Cited by

References

    1. Shendure J., Mitra R.D., Church G.M. Advanced sequencing technologies: methods and goals. Nature Rev. Gen. 2004;5:335–344. - PubMed
    1. Brenner S., Johnson M., Bridgham J., Golda G., Lloyd D., Johnson D., Luo S., McCurdy S., Foy M., Ewan M., et al. Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 2003;18:630–634. - PubMed
    1. Kling J. Ultrafast DNA sequencing. Nat. Biotechol. 2003;21:1425–1427. - PubMed
    1. Miller R.D., Duan S., Lovins E.G., Kloss E.F., Kwok P.-Y. Efficient high-throughput resequencing of genomic DNA. Genome Res. 2003;13:717–720. - PMC - PubMed
    1. Ronaghi M., Uhlen M., Nyren P. DNA sequencing: a sequencing method based on real-time pyrophosphate. Science. 1998;281:363–365. - PubMed

Publication types

MeSH terms

LinkOut - more resources