language (original) (raw)

Let Σ be an alphabet. We then define the following using the powers of an alphabet and infiniteMathworldPlanetmath union, where n∈ℤ.

Σ+ = ⋃n=1∞Σn
Σ* = ⋃n=0∞Σn=Σ+∪{λ}

where λ is the element called empty string. A string is an element of Σ*, meaning that it is a grouping of symbols from Σ one after another (via concatenationMathworldPlanetmath). For example, a⁢b⁢b⁢c is a string, and c⁢b⁢b⁢a is a different string. A string is also commonly called a word.Σ+, like Σ*, contains all finite strings except thatΣ+ does not contain the empty string λ. Given a string s∈Σ*, a string t is a substring of s ifs=u⁢t⁢v for some strings u,v∈Σ*. For example, l⁢p,a⁢l,h⁢a,a⁢l⁢p⁢h⁢a, and λ (the empty string) are all substrings of the string a⁢l⁢p⁢h⁢a.

Definition. A language over an alphabet Σ is a subset of Σ*, meaning that it is a set (http://planetmath.org/Set) of strings made from the symbols in the alphabet Σ.

Take for example an alphabet Σ={♣,℘,63,a,A}. The following are all languages over Σ:

A language L is said to be proper if the empty string does not belong to it: λ∉L. A proper language is also said to be λ-free. Otherwise, it is improper. In the examples above, all but the first and the last examples are λ-free. L is a finite language if L is a finite setMathworldPlanetmath, and atomic if it is a singleton subset of Σ, such as the fourth example above. A language can be arbitrarily formed, or constructed via some set of rules called a formal grammar.

Given a language L over Σ, the alphabet of L is defined as the maximal subset Σ⁢(L) of Σ such that every symbol in Σ⁢(L) occurs in some word in L. Equivalently, define the alphabet of a word w to be the set Σ⁢(w) of all symbols that occur in w, then Σ⁢(L) is the union of all Σ⁢(w), where w ranges over L.