Can we recover a sequence, just knowing all its subsequences of given length? - PubMed (original) (raw)
Can we recover a sequence, just knowing all its subsequences of given length?
A Guénoche. Comput Appl Biosci. 1992 Dec.
Abstract
The problem tackled here concerns the feasibility of DNA sequencing using hybridization methods. We establish algorithms for and computational limitations to the reconstruction of a sequence from all its subsequences having the same length: in other words, the building of a string that contains all the words of a given set, and only these ones. Generally there are several possible strings. We refer to graph theory and propose an algorithm to enumerate all the strings that are solutions. We then carried out stimulations using real DNA sequences. They provided some necessary conditions and give some upper bounds to the length of the sequence to recover in relation with the length of oligonucleotides. To avoid limiting ourselves to problems that admit a unique solution, we introduce another algorithm that produces a signature for each solution string. Each signature can be tested to determine which one belongs to the correct sequence.
Similar articles
- Fast exact algorithms for the closest string and substring problems with application to the planted (L, d)-motif model.
Chen ZZ, Wang L. Chen ZZ, et al. IEEE/ACM Trans Comput Biol Bioinform. 2011 Sep-Oct;8(5):1400-10. doi: 10.1109/TCBB.2011.21. IEEE/ACM Trans Comput Biol Bioinform. 2011. PMID: 21282867 - Multistage sequencing by hybridization.
Kruglyak S. Kruglyak S. J Comput Biol. 1998 Spring;5(1):165-71. doi: 10.1089/cmb.1998.5.165. J Comput Biol. 1998. PMID: 9541879 - Sequential and parallel algorithms for DNA sequencing.
Blazewicz J, Kaczmarek J, Kasprzak M, Markiewicz WT, Weglarz J. Blazewicz J, et al. Comput Appl Biosci. 1997 Apr;13(2):151-8. doi: 10.1093/bioinformatics/13.2.151. Comput Appl Biosci. 1997. PMID: 9146962 - DNA sequencing by hybridization--a megasequencing method and a diagnostic tool?
Mirzabekov AD. Mirzabekov AD. Trends Biotechnol. 1994 Jan;12(1):27-32. doi: 10.1016/0167-7799(94)90008-6. Trends Biotechnol. 1994. PMID: 7764555 Review. - Key-string algorithm--novel approach to computational analysis of repetitive sequences in human centromeric DNA.
Rosandić M, Paar V, Gluncić M, Basar I, Pavin N. Rosandić M, et al. Croat Med J. 2003 Aug;44(4):386-406. Croat Med J. 2003. PMID: 12950141 Review.
Cited by
- Improving RNA Assembly via Safety and Completeness in Flow Decompositions.
Khan S, Kortelainen M, Cáceres M, Williams L, Tomescu AI. Khan S, et al. J Comput Biol. 2022 Dec;29(12):1270-1287. doi: 10.1089/cmb.2022.0261. Epub 2022 Oct 25. J Comput Biol. 2022. PMID: 36288562 Free PMC article. - Assembly complexity of prokaryotic genomes using short reads.
Kingsford C, Schatz MC, Pop M. Kingsford C, et al. BMC Bioinformatics. 2010 Jan 12;11:21. doi: 10.1186/1471-2105-11-21. BMC Bioinformatics. 2010. PMID: 20064276 Free PMC article.