language (original) (raw)
Let Σ be an alphabet. We then define the following using the powers of an alphabet and infinite union, where n∈ℤ.
Σ+ | = | ⋃n=1∞Σn |
---|---|---|
Σ* | = | ⋃n=0∞Σn=Σ+∪{λ} |
where λ is the element called empty string. A string is an element of Σ*, meaning that it is a grouping of symbols from Σ one after another (via concatenation). For example, abbc is a string, and cbba is a different string. A string is also commonly called a word.Σ+, like Σ*, contains all finite strings except thatΣ+ does not contain the empty string λ. Given a string s∈Σ*, a string t is a substring of s ifs=utv for some strings u,v∈Σ*. For example, lp,al,ha,alpha, and λ (the empty string) are all substrings of the string alpha.
Definition. A language over an alphabet Σ is a subset of Σ*, meaning that it is a set (http://planetmath.org/Set) of strings made from the symbols in the alphabet Σ.
Take for example an alphabet Σ={♣,℘,63,a,A}. The following are all languages over Σ:
- •
{aaa,λ,A℘63,63♣,AaAaA}, - •
{℘a,℘aa,℘aaa,℘aaaa,⋯}, - •
The empty set∅. In the context of languages, ∅ is called the empty language.
- •
{63} - •
{a2n∣n≥0}
A language L is said to be proper if the empty string does not belong to it: λ∉L. A proper language is also said to be λ-free. Otherwise, it is improper. In the examples above, all but the first and the last examples are λ-free. L is a finite language if L is a finite set, and atomic if it is a singleton subset of Σ, such as the fourth example above. A language can be arbitrarily formed, or constructed via some set of rules called a formal grammar.
Given a language L over Σ, the alphabet of L is defined as the maximal subset Σ(L) of Σ such that every symbol in Σ(L) occurs in some word in L. Equivalently, define the alphabet of a word w to be the set Σ(w) of all symbols that occur in w, then Σ(L) is the union of all Σ(w), where w ranges over L.