Automatic extraction of motifs represented in the hidden Markov model from a number of DNA sequences - PubMed (original) (raw)

Motivation: Automatic extraction of motifs that occur frequently on a set of unaligned DNA sequences is useful for predicting the binding sites of unknown transcription factors. Several programs for this purpose have been released. However, in our opinion, they are not practical enough to be applied to a large number of upstream sequences.

Results: We propose a new program called YEBIS (Yet another Environment for the analysis of BIopolymer Sequences) which is capable of extracting a set of motifs, without any a priori knowledge, from a number of functionally related DNA sequences. Using the hidden Markov model, these motifs are represented in a more general form than other conventional methods, such as the weight matrix method. When applied to several sets of benchmark data, it was found that YEBIS had comparable capability to the existing methods, but was much faster. Moreover, it could extract all known motifs from the LTR sequences (long terminal repeat sequences) in a single run. Finally, it could be successfully applied to approximately 400 human promoter sequences and some of the extracted motifs turned out to be known cis-elements. Therefore, YEBIS could be a practical tool for exploring the upstream sequences of genomic ORFs, some of which are regulated in a similar fashion.

Availability: YEBIS will be distributed to academic users free of charge. All requests should be sent to the address below.

Contact: E-MAIL: yada@tokyo.jst.go.jp