Large deviation properties for patterns (original) (raw)

2012, CiteSeer X (The Pennsylvania State University)

Deciding whether a given pattern is overrepresented or under-represented according to a given background model is a key question in computational biology. Such a decision is usually made by computing some p-values reflecting the "exceptionality" of a pattern in a given sequence or set of sequences. In the simplest cases (short and simple patterns, simple background model, small number of sequences), an exact p-value can be computed with a tractable complexity. The realistic cases are in general too complicated to get such an exact p-value. Approximations are thus proposed (Gaussian, Poisson, Large deviation approximations). These approximations are applicable under some conditions: Gaussian approximations are valid in the central domain while Poisson and Large deviations approximations are valid for rare events. In the present paper, we prove a large deviation approximation to the double strands counting problem that refers to a counting of a given pattern in a set of sequences that arise from both strands of the genome. Here dependencies between a sequence and its complement plays a fundamental role. General combinatorial properties of the pattern allow to deal with such a dependency. A large deviation result is also provided for a set of small sequences.

Sign up to get access to over 50M papers

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.