Ogden's lemma (original) (raw)

From Wikipedia, the free encyclopedia

In the theory of formal languages, Ogden's lemma (named after William F. Ogden)[1] is a generalization of the pumping lemma for context-free languages.

Despite Ogden's lemma being a strengthening of the pumping lemma, it is insufficient to fully characterize the class of context-free languages.[2] This is in contrast to the Myhill-Nerode theorem, which unlike the pumping lemma for regular languages is a necessary and sufficient condition for regularity.

We will use underlines to indicate "marked" positions.

Ogden's lemma is often stated in the following form, which can be obtained by "forgetting about" the grammar, and concentrating on the language itself: If a language L is context-free, then there exists some number p ≥ 1 {\displaystyle p\geq 1} {\displaystyle p\geq 1} (where p may or may not be a pumping length) such that for any string s of length at least p in L and every way of "marking" p or more of the positions in s, s can be written as

s = u v w x y {\displaystyle s=uvwxy} {\displaystyle s=uvwxy}

with strings u, v, w, x, and y, such that

  1. vx has at least one marked position,
  2. vwx has at most p marked positions, and
  3. u v n w x n y ∈ L {\displaystyle uv^{n}wx^{n}y\in L} {\displaystyle uv^{n}wx^{n}y\in L} for all n ≥ 0 {\displaystyle n\geq 0} {\displaystyle n\geq 0}.

In the special case where every position is marked, Ogden's lemma is equivalent to the pumping lemma for context-free languages. Ogden's lemma can be used to show that certain languages are not context-free in cases where the pumping lemma is not sufficient. An example is the language { a i b j c k d l : i = 0 or j = k = l } {\displaystyle \{a^{i}b^{j}c^{k}d^{l}:i=0{\text{ or }}j=k=l\}} {\displaystyle \{a^{i}b^{j}c^{k}d^{l}:i=0{\text{ or }}j=k=l\}}.

Example applications

[edit]

Non-context-freeness

[edit]

The special case of Ogden's lemma is often sufficient to prove some languages are not context-free. For example, { a m b n c m d n | m , n ≥ 1 } {\displaystyle \{a^{m}b^{n}c^{m}d^{n}|m,n\geq 1\}} {\displaystyle \{a^{m}b^{n}c^{m}d^{n}|m,n\geq 1\}} is a standard example of non-context-free language,[3]

Proof

Suppose the language is generated by a context-free grammar, then let p {\displaystyle p} {\displaystyle p} be the length required in Ogden's lemma, then consider the word a p b p c p _ d p {\displaystyle a^{p}{\underline {b^{p}c^{p}}}d^{p}} {\displaystyle a^{p}{\underline {b^{p}c^{p}}}d^{p}} in the language. Then the three conditions implied by Ogden's lemma cannot all be satisfied.

Similarly, one can prove the "copy twice" language L = { w 2 | w ∈ { a , b } ∗ } {\displaystyle L=\{w^{2}|w\in \{a,b\}^{*}\}} {\displaystyle L=\{w^{2}|w\in \{a,b\}^{*}\}} is not context-free, by using Ogden's lemma on a 2 p b 2 p _ a 2 p b 2 p {\displaystyle a^{2p}{\underline {b^{2p}}}a^{2p}b^{2p}} {\displaystyle a^{2p}{\underline {b^{2p}}}a^{2p}b^{2p}}.

And the given example last section { a i b j c k d l : i = 0 or j = k = l } {\displaystyle \{a^{i}b^{j}c^{k}d^{l}:i=0{\text{ or }}j=k=l\}} {\displaystyle \{a^{i}b^{j}c^{k}d^{l}:i=0{\text{ or }}j=k=l\}} is not context-free by using Ogden's lemma on a b 2 p c 2 p _ d 2 p {\displaystyle ab^{2p}{\underline {c^{2p}}}d^{2p}} {\displaystyle ab^{2p}{\underline {c^{2p}}}d^{2p}}.

Ogden's lemma can be used to prove the inherent ambiguity of some languages, which is implied by the title of Ogden's paper.

Example: Let L 0 = { a n b m c m | m , n ≥ 1 } , L 1 = { a m b m c n | m , n ≥ 1 } {\displaystyle L_{0}=\{a^{n}b^{m}c^{m}|m,n\geq 1\},L_{1}=\{a^{m}b^{m}c^{n}|m,n\geq 1\}} {\displaystyle L_{0}=\{a^{n}b^{m}c^{m}|m,n\geq 1\},L_{1}=\{a^{m}b^{m}c^{n}|m,n\geq 1\}}. The language L = L 0 ∪ L 1 {\displaystyle L=L_{0}\cup L_{1}} {\displaystyle L=L_{0}\cup L_{1}} is inherently ambiguous. (Example from page 3 of Ogden's paper.)

Proof

Let p {\displaystyle p} {\displaystyle p} be the pumping length needed for Ogden's lemma, and apply it to the sentence a p ! + p b p c p _ {\displaystyle a^{p!+p}{\underline {b^{p}c^{p}}}} {\displaystyle a^{p!+p}{\underline {b^{p}c^{p}}}}.

By routine checking of the conditions of Ogden's lemma, we find that the derivation is

S ⇒ ∗ u A v ⇒ ∗ u x A y v ⇒ ∗ u x z y v {\displaystyle S\Rightarrow ^{*}uAv\Rightarrow ^{*}uxAyv\Rightarrow ^{*}uxzyv} {\displaystyle S\Rightarrow ^{*}uAv\Rightarrow ^{*}uxAyv\Rightarrow ^{*}uxzyv}where u = a p ! + p b p − s − k , x = b k , z = b s c s ′ , y = c k , v = c p − s ′ − k {\displaystyle u=a^{p!+p}b^{p-s-k},x=b^{k},z=b^{s}c^{s'},y=c^{k},v=c^{p-s'-k}} {\displaystyle u=a^{p!+p}b^{p-s-k},x=b^{k},z=b^{s}c^{s'},y=c^{k},v=c^{p-s'-k}}, satisfying s + s ′ ≥ 1 {\displaystyle s+s'\geq 1} {\displaystyle s+s'\geq 1} and k ≥ 1 {\displaystyle k\geq 1} {\displaystyle k\geq 1} and p ≥ s + s ′ + 2 k {\displaystyle p\geq s+s'+2k} {\displaystyle p\geq s+s'+2k}.

Thus, we obtain a derivation of a p ! + p b p ! + p c p ! + p {\displaystyle a^{p!+p}b^{p!+p}c^{p!+p}} {\displaystyle a^{p!+p}b^{p!+p}c^{p!+p}} by interpolating the derivation with p ! / k {\displaystyle p!/k} {\displaystyle p!/k} copies of A ⇒ ∗ x A y {\displaystyle A\Rightarrow ^{*}xAy} {\displaystyle A\Rightarrow ^{*}xAy}. According to this derivation, an entire sub-sentence x p ! / k + 1 z y p ! / k + 1 = b p ! + k + s c p ! + k + s ′ {\displaystyle x^{p!/k+1}zy^{p!/k+1}=b^{p!+k+s}c^{p!+k+s'}} {\displaystyle x^{p!/k+1}zy^{p!/k+1}=b^{p!+k+s}c^{p!+k+s'}} is the descendent of one node A {\displaystyle A} {\displaystyle A} in the derivation tree.

Symmetrically, we can obtain another derivation of a p ! + p b p ! + p c p ! + p {\displaystyle a^{p!+p}b^{p!+p}c^{p!+p}} {\displaystyle a^{p!+p}b^{p!+p}c^{p!+p}}, according to which there is an entire sub-sentence a p ! + k ″ + s ″ b p ! + k ″ + s ‴ {\displaystyle a^{p!+k''+s''}b^{p!+k''+s'''}} {\displaystyle a^{p!+k''+s''}b^{p!+k''+s'''}} being the descendent of one node in the derivation tree.

Since ( p ! + k + s ) + ( p ! + k ″ + s ‴ ) > 2 p ! + 1 > p ! + p {\displaystyle (p!+k+s)+(p!+k''+s''')>2p!+1>p!+p} {\displaystyle (p!+k+s)+(p!+k''+s''')>2p!+1>p!+p}, the two sub-sentences have nonempty intersection, and since neither contains the other, the two derivation trees are different.

Similarly, L ∗ {\displaystyle L^{*}} {\displaystyle L^{*}} is inherently ambiguous, and for any CFG of the language, letting p {\displaystyle p} {\displaystyle p} be the constant for Ogden's lemma, we find that ( a p ! + p b p ! + p c p ! + p ) n {\displaystyle (a^{p!+p}b^{p!+p}c^{p!+p})^{n}} {\displaystyle (a^{p!+p}b^{p!+p}c^{p!+p})^{n}} has at least 2 n {\displaystyle 2^{n}} {\displaystyle 2^{n}} different parses. Thus L ∗ {\displaystyle L^{*}} {\displaystyle L^{*}} has an unbounded degree of inherent ambiguity.

The proof can be extended to show that deciding whether a CFG is inherently ambiguous is undecidable, by reduction to the Post correspondence problem. It can also show that deciding whether a CFG has an unbounded degree of inherent ambiguity is undecidable. (page 4 of Ogden's paper)

Construction

Given any Post correspondence problem over binary strings, we reduce it to a decision problem over a CFG.

Given any two lists of binary strings ξ 1 , . . . , ξ n {\displaystyle \xi _{1},...,\xi _{n}} {\displaystyle \xi _{1},...,\xi _{n}} and η 1 , . . . , η n {\displaystyle \eta _{1},...,\eta _{n}} {\displaystyle \eta _{1},...,\eta _{n}}, rewrite the binary alphabet to { d , e } {\displaystyle \{d,e\}} {\displaystyle \{d,e\}}.

Let L ( ξ 1 , ⋯ , ξ n ) {\displaystyle L\left(\xi _{1},\cdots ,\xi _{n}\right)} {\displaystyle L\left(\xi _{1},\cdots ,\xi _{n}\right)} be the language over alphabet { d , e , f } {\displaystyle \{d,e,f\}} {\displaystyle \{d,e,f\}}, generated by the CFG with rules S → ξ i S d i e | f {\displaystyle S\to \xi _{i}Sd^{i}e|f} {\displaystyle S\to \xi _{i}Sd^{i}e|f} for every i = 1 , . . . , n {\displaystyle i=1,...,n} {\displaystyle i=1,...,n}. Similarly define L ( η 1 , ⋯ , η n ) {\displaystyle L\left(\eta _{1},\cdots ,\eta _{n}\right)} {\displaystyle L\left(\eta _{1},\cdots ,\eta _{n}\right)}.

Now, by the same argument as above, the language ( L 0 ⋅ L ( ξ 1 , ⋯ , ξ n ) ) ∪ ( L 1 ⋅ L ( η 1 , ⋯ , η n ) ) {\displaystyle (L_{0}\cdot L\left(\xi _{1},\cdots ,\xi _{n}\right))\cup (L_{1}\cdot L\left(\eta _{1},\cdots ,\eta _{n}\right))} {\displaystyle (L_{0}\cdot L\left(\xi _{1},\cdots ,\xi _{n}\right))\cup (L_{1}\cdot L\left(\eta _{1},\cdots ,\eta _{n}\right))} is inherently ambiguous iff the Post correspondence problem has a solution.

And the language ( ( L 0 ⋅ L ( ξ 1 , ⋯ , ξ n ) ) ∪ ( L 1 ⋅ L ( η 1 , ⋯ , η n ) ) ) ∗ {\displaystyle ((L_{0}\cdot L\left(\xi _{1},\cdots ,\xi _{n}\right))\cup (L_{1}\cdot L\left(\eta _{1},\cdots ,\eta _{n}\right)))^{*}} {\displaystyle ((L_{0}\cdot L\left(\xi _{1},\cdots ,\xi _{n}\right))\cup (L_{1}\cdot L\left(\eta _{1},\cdots ,\eta _{n}\right)))^{*}} has an unbounded degree of inherent ambiguity iff the Post correspondence problem has a solution.

Generalized condition

[edit]

Bader and Moura have generalized the lemma[4] to allow marking some positions that are not to be included in vx. Their dependence of the parameters was later improved by Dömösi and Kudlek.[5] If we denote the number of such excluded positions by e, then the number d of marked positions of which we want to include some in vx must satisfy d ≥ p ( e + 1 ) {\displaystyle d\geq p(e+1)} {\displaystyle d\geq p(e+1)}, where p is some constant that depends only on the language. The statement becomes that every s can be written as

s = u v w x y {\displaystyle s=uvwxy} {\displaystyle s=uvwxy}

with strings u, v, w, x, and y, such that

  1. vx has at least one marked position and no excluded position,
  2. vwx has at most p ( e + 1 ) {\displaystyle p^{(e+1)}} {\displaystyle p^{(e+1)}} marked positions, and
  3. u v n w x n y ∈ L {\displaystyle uv^{n}wx^{n}y\in L} {\displaystyle uv^{n}wx^{n}y\in L} for all n ≥ 0 {\displaystyle n\geq 0} {\displaystyle n\geq 0}.

Moreover, either each of u,v,w has a marked position, or each of w , x , y {\displaystyle w,x,y} {\displaystyle w,x,y} has a marked position.

  1. ^ Ogden, William (September 1968). "A helpful result for proving inherent ambiguity". Mathematical Systems Theory. 2 (3): 191–194. doi:10.1007/bf01694004. ISSN 0025-5661. S2CID 13197551.
  2. ^ Kracht, Marcus (2004). Too Many Languages Satisfy Ogden's Lemma (PDF). Proceedings of the 27th Pennsylvania Linguistics Colloquium. Philadelphia. pp. 115–121. Retrieved 16 May 2024.
  3. ^ Hopcroft, John E. (1979). Introduction to automata theory, languages, and computation. Jeffrey D. Ullman. Reading, Mass.: Addison-Wesley. p. 128. ISBN 0-201-02988-X. OCLC 4549363.
  4. ^ Bader, Christopher; Moura, Arnaldo (April 1982). "A Generalization of Ogden's Lemma". Applied Mathematics and Computation. 29 (2): 404–407. doi:10.1145/322307.322315. S2CID 33988796.
  5. ^ Dömösi, Pál; Kudlek, Manfred (1999), "Strong iteration lemmata for regular, linear, context-free, and linear indexed languages", Fundamentals of Computation Theory, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 226–233, doi:10.1007/3-540-48321-7_18, ISBN 978-3-540-66412-3, retrieved 2023-02-26