Approximate matching of regular expressions (original) (raw)

Abstract

Given a sequence_A_ and regular expression_R_, the_approximate regular expression matching_ problem is to find a sequence matching_R_ whose optimal alignment with_A_ is the highest scoring of all such sequences. This paper develops an algorithm to solve the problem in time_O(MN), where_M and_N_ are the lengths of_A_ and_R_. Thus, the time requirement is asymptotically no worse than for the simpler problem of aligning two fixed sequences. Our method is superior to an earlier algorithm by Wagner and Seiferas in several ways. First, it treats real-valued costs, in addition to integer costs, with no loss of asymptotic efficiency. Second, it requires only_O(N)_ space to deliver just the score of the best alignment. Finally, its structure permits implementation techniques that make it extremely fast in practice. We extend the method to accommodate gap penalties, as required for typical applications in molecular biology, and further refine it to search for substrings of_A_ that strongly align with a sequence in_R_, as required for typical data base searches. We also show how to deliver an optimal alignment between_A_ and_R_ in only_O_(N+log_M_) space using_O_(MN log_M_) time. Finally, an_O_(MN(M+N)+N 2log_N_) time algorithm is presented for alignment scoring schemes where the cost of a gap is an arbitrary increasing function of its length.

Access this article

Log in via an institution

Subscribe and save

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literature

Download references

Author information

Authors and Affiliations

  1. Department of Computer Science, University of Arizona, 85721, Tucson, AZ, U.S.A.
    Eugene W. Myers
  2. Department of Computer Science, The Pennsylvania State University, 16802, University Park, PA, U.S.A.
    Webb Miller

Authors

  1. Eugene W. Myers
    You can also search for this author inPubMed Google Scholar
  2. Webb Miller
    You can also search for this author inPubMed Google Scholar

Rights and permissions

About this article

Cite this article

Myers, E.W., Miller, W. Approximate matching of regular expressions.Bltn Mathcal Biology 51, 5–37 (1989). https://doi.org/10.1007/BF02458834

Download citation

Keywords