formal grammar (original) (raw)

Introduction

A grammarMathworldPlanetmath, loosely speaking, is a set of rules that can be applied to words to generate sentencesMathworldPlanetmath in a languagePlanetmathPlanetmath. For example, with the grammar of the English language, one can form syntactically correct sentences such as “The elephant drove his bicycle to the moon,” regardless whether the sentence is meaningful or not.

The mathematical abstraction of a grammar is known as a formal grammar. Instead of generating sentences from words, a formal grammar generates words from symbols and other words. The following basic ingredients are necessary in a formal grammar:

To see how these rewriting rules work, let us look at an example. Let {a,b} be the alphabet as well as the set of initial words. With the rewriting rules given by: from a word x we can form the word a⁢x, as well as the word x⁢a, we would be able to generate words like

However, words such as

can not be produced.

Note that by adding an extra symbol σ to the above alphabet, and two additional rewriting rules given by “from σ form a” and “from σ form b”, it is not hard to see that any word that can be generated by the first grammar can be generated by this new grammar.

Definition

Formalizing what we have discussed above, we say that a formal grammar G is a quadruple (Σ,N,P,σ), where

    1. (Σ,P) is a rewriting system;
    1. N is a subset of Σ whose elements are called non-terminals, and T:=Σ-N the set of terminals;
    1. an element σ∈N called the starting symbol.

Instead of writing G=(Σ,N,P,σ), the quadruple (T,N,P,σ) is another way of representing G (as long as the conditions Σ=T∪N and T∩N=∅ are satisfied).

A formal grammar is variously known as a phrase-structure grammar, an unrestricted grammar, or simply a grammar. A formal grammar is sometimes also called a rewriting system in the literature, although the two notions are distinct on PlanetMath.

Given a formal grammar G, a word W over Σ such that σ⇒*W is called a sentential form of G. A sentential form over T is called a word generated by G. The set of all words generated by G is called the formal language generated by G, and is denoted by L⁢(G). In other words,

where ⇒* is the derivability relation in the rewriting system (Σ,P). A formal language is also known as a phrase-structure language.

A language L over an alphabet A is said to be generable by a formal grammar if there is a formal grammar G such that L=L⁢(G)∩A*.

Example. Continuing from the example in the previous sectionPlanetmathPlanetmath, we can put T={a,b} and N={σ}. For the set P of productions, we have four

    1. σ→σ⁢a
    1. σ→a⁢σ
    1. σ→a
    1. σ→b

Then G=(Σ,N,P,σ) is a formal grammar. It is not hard to see that σ⇒*b⁢a⁢a, as σ→σ⁢a→σ⁢a⁢a→b⁢a⁢a. In fact, L⁢(G) consists of all words such that a occurs at least once and b occurs at most once.

References

Title formal grammar
Canonical name FormalGrammar
Date of creation 2013-03-22 16:27:10
Last modified on 2013-03-22 16:27:10
Owner CWoo (3771)
Last modified by CWoo (3771)
Numerical id 31
Author CWoo (3771)
Entry type Definition
Classification msc 91F20
Classification msc 68Q42
Classification msc 68Q45
Classification msc 03D05
Synonym phrase-structure grammar
Synonym unrestricted grammar
Synonym grammar
Synonym phrase-structure language
Synonym terminal symbol
Synonym non-terminal symbol
Synonym initial-symbol
Synonym initial symbol
Synonym start symbol
Related topic SemiThueSystem
Related topic Language
Related topic PostSystem
Defines formal language
Defines terminal
Defines non-terminal
Defines starting symbol
Defines production
Defines generable by a formal grammar
Defines sentential form