Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation - PubMed (original) (raw)
Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation
Xingyuan Li et al. Nucleic Acids Res. 2005.
Abstract
The oligonucleotide specificity for microarray hybridization can be predicted by its sequence identity to non-targets, continuous stretch to non-targets, and/or binding free energy to non-targets. Most currently available programs only use one or two of these criteria, which may choose 'false' specific oligonucleotides or miss 'true' optimal probes in a considerable proportion. We have developed a software tool, called CommOligo using new algorithms and all three criteria for selection of optimal oligonucleotide probes. A series of filters, including sequence identity, free energy, continuous stretch, GC content, self-annealing, distance to the 3'-untranslated region (3'-UTR) and melting temperature (T(m)), are used to check each possible oligonucleotide. A sequence identity is calculated based on gapped global alignments. A traversal algorithm is used to generate alignments for free energy calculation. The optimal T(m) interval is determined based on probe candidates that have passed all other filters. Final probes are picked using a combination of user-configurable piece-wise linear functions and an iterative process. The thresholds for identity, stretch and free energy filters are automatically determined from experimental data by an accessory software tool, CommOligo_PE (CommOligo Parameter Estimator). The program was used to design probes for both whole-genome and highly homologous sequence data. CommOligo and CommOligo_PE are freely available to academic users upon request.
Figures
Figure 1
Flowchart for CommOligo.
Figure 2
Directed graph of the dynamic programming matrix for alignment of sequence ACCAA and ACGGA (A), and the traversal of the dynamic programming matrix for alignment of sequence ACCAA and ACGGA with two mismatches (B).
Figure 3
The relationships between sequence identity (A), stretch length (B) or binding free energy (C) and the number of designed probes. Vertical lines x = 0.87, x = 17 and x = −29 indicate the fitted thresholds for probe design criteria.
References
- DeRisi J.L., Iyer V.R., Brown P.O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997;278:680–686. - PubMed
- Wodicka L., Dong H., Mittmann M., Ho M.H., Lockhart D.J. Genome-wide expression monitoring in Saccharomyces cerevisiae. Nat. Biotechnol. 1997;15:1359–1367. - PubMed
- Hughes T.R., Marton M.J., Jones A.R., Roberts C.J., Stoughton R., Armour C.D., Bennett H.A., Coffey E., Dai H., He Y.D., et al. Functional discovery via a compendium of expression profiles. Cell. 2000;102:109–126. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous