Phases of a Compiler (original) (raw)

Last Updated : 11 Apr, 2026

The compilation process is divided into several phases. Each phase performs a specific task to transform the source code step by step into machine code.

There are 6 major phases in the compilation process.

compiler

Phases of Compiler

1. Lexical Analysis

First phase of a compiler that converts source code into tokens. A token is the smallest meaningful unit (keywords, identifiers, constants, operators, punctuation).

The lexical analyzer scans the code character by character and groups them into tokens based on language rules. It simplifies input for the next phases of compilation.

**Example: int x = 10;

The lexical analyzer would break this line into the following tokens:

2. Syntax Analysis or Parsing

Phase of the compiler that ensures the source code follows the grammatical rules of the programming language. It verifies that the sequence of tokens produced by the lexical analyzer is arranged correctly according to the language’s syntax.

This phase checks the proper use of operators, keywords, parentheses, and statements. If the code is not structured according to the language rules, the syntax analyzer generates appropriate error messages.

To represent the structure of the source code, syntax analysis uses parse trees or syntax trees.

phases

Parse Tree

3. Semantic Analysis

Semantic analysis is the phase of the compiler that ensures the source code is logically meaningful. While syntax analysis checks the structure of the program, semantic analysis checks its meaning.

This phase identifies semantic errors such as type mismatches, undeclared variables, and invalid operations. It verifies that the operations performed in the program are logically correct and follow the rules of the programming language in terms of data types and variable usage.

Some key checks performed during semantic analysis include:

Example:

int a = 5;
float b = 3.5;
a = a + b;

**Type Checking:

4. Intermediate Code Generation

Intermediate Code Generation is the phase of the compiler in which the source code is converted into an intermediate representation (IR). This intermediate code lies between the high-level source code and the final machine code.

It is not made for any specific machine. This makes the program more portable and easier to optimize before generating the final machine code. It acts as a bridge between the front end (analysis phases) and the back end (code generation phase) of the compiler.

**Importance of Intermediate Code -:

**Platform Independence: Since intermediate code is machine-independent, the same intermediate representation can be reused, but code generation must be performed again for each target platform.

**Simplifying Optimization: Intermediate code provides a structured and simplified representation of the program, making it easier to apply optimization techniques such as:

**Easier Translation: Intermediate code is closer to machine code but not tied to a specific machine. This makes the final code generation process smoother and more efficient. This step is handled in the back end of the compile

Example: a = b + c * d;

t1 = c * d
t2 = b + t1
a = t2

5. Code Optimization

Code Optimization is the process of improving the intermediate or target code to make the program run faster, use less memory, or be more efficient, without altering its functionality.

**Common Techniques:

**Example:

**Code Before Optimization **Code After Optimization
for ( int j = 0 ; j < n ; j ++){x = y + z ;a[j] = 6 * j;} x = y + z ;for ( int j = 0 ; j < n ; j ++){a[j] = 6 * j;}

6. Code Generation

Code Generation is the final phase of a compiler, where the intermediate representation of the source program (e.g., three-address code or abstract syntax tree) is translated into machine code or assembly code.

The source code written in a higher-level language is transformed into a lower-level language that results in a lower-level object code, which should have the following minimum properties:

**Example:

Three Address Code Assembly Code
t1 = c * dt2 = b + t1a = t2 LOAD R1, c ; Load the value of 'c' into register R1LOAD R2, d ; Load the value of 'd' into register R2MUL R1, R2 ; R1 = c * d, store result in R1LOAD R3, b ; Load the value of 'b' into register R3ADD R3, R1 ; R3 = b + (c * d), store result in R3STORE a, R3 ; Store the final result in variable 'a'

**Symbol Table - It is a data structure being used and maintained by the compiler, consisting of all the identifier's names along with their types. It helps the compiler to function smoothly by finding the identifiers quickly.

**Error Handling in Phases of Compiler

Error Handling refers to the mechanism used by the compiler in each phase to detect, report, and recover from errors without stopping the entire compilation process.

Each phase of the compiler handles specific types of errors:

Lexical Analysis

Detects errors in the character stream and ensures valid token formation.

**Example Errors:

Syntax Analysis

Checks for grammatical or structural errors based on language rules.

**Example Errors:

Semantic Analysis

Ensures the program has correct meaning and follows language semantics.

**Example Errors:

Intermediate Code Generation

Ensures correctness of the intermediate representation.

**Example Errors:

Code Optimization

Ensures optimization does not change the program’s meaning.

**Example Issues:

Code Generation

Handles errors related to machine-level translation.

**Example Errors:

Two Part of the Compilation Process

The compiler is broadly divided into two main parts:

Feature Front-End (Analysis Phase) Back-End (Synthesis Phase)
Main Work Analyzes source code Generates target code
Input Source Program Intermediate Code
Output Intermediate Representation Machine Code
Phases Included Lexical, Syntax, Semantic, Intermediate Code Generation Code Optimization, Code Generation
Error Handling Detects most errors Handles machine-level errors
Purpose Understand and verify program Produce optimized executable

1

Parts of the Compiler