Regular Expressions, Regular Grammar and Regular Languages (original) (raw)

Last Updated : 23 Jul, 2025

To work with formal languages and string patterns, it is essential to understand regular expressions, regular grammar, and regular languages. These concepts form the foundation of automata theory, compiler design, and text processing.

**Regular Expressions

Regular expressions are symbolic notations used to define search patterns in strings. They describe **regular languages and are commonly used in tasks such as validation, searching, and parsing.

A **regular expression represents a **regular language if it follows these rules:

  1. **ϵ (epsilon) is a regular expression representing the language ****{ϵ}** (the empty string).
  2. **Any symbol 'a' from the input alphabet Σ is a regular expression, representing the language ****{a}**.
  3. **Union (a + b) is a regular expression if **a and **b are regular expressions, representing the language ****{a, b}**.
  4. **Concatenation (ab) is a regular expression if **a and **b are regular expressions.
  5. _Kleene star (a) is a regular expression*, meaning **zero or more occurrences of 'a', forming a regular language.
Description Regular Expression Regular Languages
Set of vowels `(a e
'a' followed by 0 or more 'b' ab* {a, ab, abb, abbb, abbbb, ...}
Any number of vowels followed by any number of consonants [aeiou]*[bcdfghjklmnpqrstvwxyz]* {ε, a, aou, aiou, b, abcd, ...} (ε represents empty string)

**Regular Grammar

A regular grammar is a formal grammar that generates regular languages. It consists of:

**Types of Regular Grammar

  1. **Right-linear Grammar: All production rules are of the form: A -> aB or A -> a.
    Example:

    S -> aS | bS | ε

  2. **Left-linear Grammar: All production rules are of the form: A -> Ba or A -> a.

read more about - Regular grammar

**Regular Languages

Regular languages are the class of languages that can be represented using finite automata, regular expressions, or regular grammar. These languages have predictable patterns and are computationally efficient to recognize.

**Properties of Regular Languages

**1. Closure Properties

Regular languages are closed under operations like union, concatenation, and Kleene star.

read more about - Closure properties of Regular languages

**2. Finite Automata:

Every regular language can be recognized by a finite automaton (DFA or NFA).

**3. Decision Problems:

Problems like membership testing, emptiness, and equivalence can be solved for regular languages.

**Note: Two regular expressions are equivalent if languages generated by them are same. For example, (a+b*)* and (a+b)* generate same language. Every string which is generated by (a+b*)* is also generated by (a+b)* and vice versa.

**Comparison of Regular Expressions, Grammar, and Languages

Aspect Regular Expressions Regular Grammar Regular Languages
**Definition Pattern representation of strings Rule-based generation of strings Language class described by regex and grammar
**Representation Symbols and operators Terminals, non-terminals, production rules Finite automata, regex, or grammar
**Use Case Text processing, validation Syntax generation for compilers Language recognition

How to solve problems on regular expression and regular languages?

**Question 1

**Which one of the following languages over the alphabet {0,1} is described by the regular expression?
*_(0+1)0(0+1)0(0+1)
(A) The set of all strings containing the substring 00.
(B) The set of all strings containing at most two 0s.
(C) The set of all strings containing at least two 0s.
(D) The set of all strings that begin and end with either 0 or 1.

**Solution:

**Correct Answer: ****(C)**

**Question 2

**Which of the following languages is generated by the given grammar?
**S -> aS | bS | ε
(A) {aⁿbᵐ | n,m ≥ 0}
(B) {w ∈ {a,b}* | w has an equal number of as and bs}
(C) {aⁿ | n ≥ 0} ∪ {bⁿ | n ≥ 0} ∪ {aⁿbⁿ | n ≥ 0}
(D) {a,b}*

**Solution:

**Correct Answer: ****(D)**

**Question 3

**The regular expression 0*(10*)* denotes the same set as:
(A) (1__0)1_
_(B) 0 + (0 + 10)
(C) (0 + 1)_10(0 + 1)
(D) None of these

**Solution:
Two regular expressions are equivalent if the languages they generate are the same.

**Correct Answer: ****(A)**

**Question 4

**The regular expression for the language with input alphabets a and b , where two a s do not come together, is:
(A) (b + ab)* + (b + ab)_a
_(B) a(b + ba) + (b + ba)*
(C) Both (A) and (B)
(D) None of the above

**Solution:
The language can be expressed as:
**L = {ε, a, b, bb, ab, aba, ba, bab, baba, abab, ...}

**Correct Answer: ****(C)**