Regular Expressions, Regular Grammar and Regular Languages (original) (raw)

Last Updated : 23 Jul, 2025

To work with formal languages and string patterns, it is essential to understand regular expressions, regular grammar, and regular languages. These concepts form the foundation of automata theory, compiler design, and text processing.

**Regular Expressions

Regular expressions are symbolic notations used to define search patterns in strings. They describe **regular languages and are commonly used in tasks such as validation, searching, and parsing.

A **regular expression represents a **regular language if it follows these rules:

**ϵ (epsilon) is a regular expression representing the language ****{ϵ}** (the empty string).
**Any symbol 'a' from the input alphabet Σ is a regular expression, representing the language ****{a}**.
**Union (a + b) is a regular expression if **a and **b are regular expressions, representing the language ****{a, b}**.
**Concatenation (ab) is a regular expression if **a and **b are regular expressions.
_Kleene star (a) is a regular expression*, meaning **zero or more occurrences of 'a', forming a regular language.

Description	Regular Expression	Regular Languages
Set of vowels	`(a	e
'a' followed by 0 or more 'b'	ab*	{a, ab, abb, abbb, abbbb, ...}
Any number of vowels followed by any number of consonants	[aeiou][bcdfghjklmnpqrstvwxyz]	{ε, a, aou, aiou, b, abcd, ...} (ε represents empty string)

**Regular Grammar

A regular grammar is a formal grammar that generates regular languages. It consists of:

**Terminals: Symbols that form strings (e.g., a, b).
**Non-terminals: Variables used to derive strings (e.g., S, A).
**Production Rules: Rules for transforming non-terminals into terminals or other non-terminals.
**Start Symbol: The non-terminal from which derivations begin.

**Types of Regular Grammar

**Right-linear Grammar: All production rules are of the form: A -> aB or A -> a.
Example:

S -> aS | bS | ε
**Left-linear Grammar: All production rules are of the form: A -> Ba or A -> a.

**Regular Languages

Regular languages are the class of languages that can be represented using finite automata, regular expressions, or regular grammar. These languages have predictable patterns and are computationally efficient to recognize.

**Properties of Regular Languages

**1. Closure Properties

Regular languages are closed under operations like union, concatenation, and Kleene star.

**Union: If L1 and If L2 are two regular languages, their union L1 ? L2 will also be regular. For example, L1 = {an | n ? 0} and L2 = {bn | n ? 0} L3 = L1 ? L2 = {an ? bn | n ? 0} is also regular.
**Intersection: If L1 and If L2 are two regular languages, their intersection L1 ? L2 will also be regular. For example, L1= {ambn | n ? 0 and m ? 0} and L2= {ambn ? bnam | n ? 0 and m ? 0} L3 = L1 ? L2 = {ambn | n ? 0 and m ? 0} is also regular.
**Concatenation: If L1 and If L2 are two regular languages, their concatenation L1.L2 will also be regular. For example, L1 = {an | n ? 0} and L2 = {bn | n ? 0} L3 = L1.L2 = {am. bn | m ? 0 and n ? 0} is also regular.
**Kleene Closure: If L1 is a regular language, its Kleene closure L1* will also be regular. For example, L1 = (a ? b) L1* = (a ? b)*
**Complement: If L(G) is regular language, its complement L’(G) will also be regular. Complement of a language can be found by subtracting strings which are in L(G) from all possible strings. For example, L(G) = {an | n > 3} L’(G) = {an | n <= 3}

read more about - Closure properties of Regular languages

**2. Finite Automata:

Every regular language can be recognized by a finite automaton (DFA or NFA).

**3. Decision Problems:

Problems like membership testing, emptiness, and equivalence can be solved for regular languages.

**Note: Two regular expressions are equivalent if languages generated by them are same. For example, (a+b*)* and (a+b)* generate same language. Every string which is generated by (a+b*)* is also generated by (a+b)* and vice versa.

**Comparison of Regular Expressions, Grammar, and Languages

Aspect	Regular Expressions	Regular Grammar	Regular Languages
**Definition	Pattern representation of strings	Rule-based generation of strings	Language class described by regex and grammar
**Representation	Symbols and operators	Terminals, non-terminals, production rules	Finite automata, regex, or grammar
**Use Case	Text processing, validation	Syntax generation for compilers	Language recognition

How to solve problems on regular expression and regular languages?

**Question 1

**Which one of the following languages over the alphabet {0,1} is described by the regular expression?
*_(0+1)0(0+1)0(0+1)
(A) The set of all strings containing the substring 00.
(B) The set of all strings containing at most two 0s.
(C) The set of all strings containing at least two 0s.
(D) The set of all strings that begin and end with either 0 or 1.

**Solution:

**Option A: This suggests the language must have the substring 00. However, the string 10101 is part of the language but does not contain 00. So, **Option A is incorrect.
**Option B: This states that the language can have a maximum of two 0s. But the string 00000 is part of the language, which violates this condition. So, **Option B is incorrect.
**Option C: This states that the language must contain at least two 0s. The regular expression ensures at least two 0s are present. Hence, **Option C is correct.
**Option D: This claims the language includes all strings that begin and end with either 0 or 1. However, the language can generate strings starting with 0 and ending with 1 or vice versa. So, **Option D is incorrect.

**Correct Answer: ****(C)**

**Question 2

**Which of the following languages is generated by the given grammar?
**S -> aS | bS | ε
(A) {aⁿbᵐ | n,m ≥ 0}
(B) {w ∈ {a,b}* | w has an equal number of as and bs}
(C) {aⁿ | n ≥ 0} ∪ {bⁿ | n ≥ 0} ∪ {aⁿbⁿ | n ≥ 0}
(D) {a,b}*

**Solution:

**Option A: This describes strings with n as followed by m bs. However, the grammar can produce strings like ba (S -> bS -> ba), which violates the pattern. So, **Option A is incorrect.
**Option B: This states that strings have an equal number of as and bs. But the string b (S -> bS -> b) does not satisfy this condition. So, **Option B is incorrect.
**Option C: This describes strings with only as, only bs, or n as followed by n bs. However, the string ba does not fit this pattern. So, **Option C is incorrect.
**Option D: This includes all strings with any number of as and bs in any order. The grammar can generate all such strings, including ba. Hence, **Option D is correct.

**Correct Answer: ****(D)**

**Question 3

**The regular expression 0*(10*)* denotes the same set as:
(A) (1__0)1_
_(B) 0 + (0 + 10)
(C) (0 + 1)_10(0 + 1)
(D) None of these

**Solution:
Two regular expressions are equivalent if the languages they generate are the same.

**Option A: This can generate all strings generated by 0*(10*)*, making them equivalent. So, **Option A is correct.
**Option B: The null string cannot be generated by this option, but 0*(10*)* can generate the null string. So, **Option B is incorrect.
**Option C: This ensures that 10 is a substring, but 0*(10*)* may or may not have 10 as a substring. So, **Option C is incorrect.

**Correct Answer: ****(A)**

**Question 4

**The regular expression for the language with input alphabets a and b , where two a s do not come together, is:
(A) (b + ab)* + (b + ab)_a
_(B) a(b + ba) + (b + ba)*
(C) Both (A) and (B)
(D) None of the above

**Solution:
The language can be expressed as:
**L = {ε, a, b, bb, ab, aba, ba, bab, baba, abab, ...}

**Option A: Uses ab as the building block for strings where two as are not adjacent. (b + ab)* covers strings ending with b, while (b + ab)*a covers strings ending with a.
**Option B: Uses ba as the building block and covers strings starting with a or b.
Both expressions correctly describe the given language.

**Correct Answer: ****(C)**