Lex programming language (original) (raw)
Lex is a program that generates lexical analyzerss ("scanners"). Lex is commonly used with the yacc parser generator. Lex is the standard lexical analyzer on Unix systems, and is included in the POSIX standard. A popular free version of lex is flex, a fast lexical analyzer. Lex reads an input file specifying the lexical analyzer and outputs code implementing the lexer in the C programming language.
Structure of a lex file
The structure of a lex file is intentionally similar to that of a yacc file; files are divided up into three parts: a definition section, a rules section, and a C code section. Sections are separated by lines that contain only two percent signs: %%
The definition section is the place to define macros using regular expressions, and also to import header files written in C.
The rules section is the most important section; it associates rules to C statements. When lex sees a pattern in its input matching a given rule, it executes the associated C code. Rules are simply regular expressions, probably containing the macros defined in the definition section.
The C code section contains C statements and functions that are copied verbatim to the generated source file. These statements presumably contain code called by the rules in the rules secion. In large programs it is more convenient to place this code in a separate file and link it in at compile time.
Example flex file
The following is an example input file for the flex version of lex. It recognizes strings of numbers (integers) in the input. Given the input "abc123z.!&*2ghj6", the program will print:
Saw an integer: 123 Saw an integer: 2 Saw an integer: 6
/*
- Example lexical analyzer for flex
- Picks out strings of digits (integers) from the input. */
/*** Definition section ***/
%{
/*
- Some C code to include the C standard I/O library.
- Everything inside the %{ %} brackets is inserted
- verbatim into the generated file. */
#include
%}
/* Macros; regular expressions */ DIGIT [0-9] INTEGER {DIGIT}+
/* This tells flex to read only one input file */ %option noyywrap
%% /* * Rules section * * Comments in this section must be indented * so lex won't mistake them for regular expressions. */
{INTEGER} { /* * This rule prints integers from the input. * yytext is a string containing the matched text. */ printf("Saw an integer: %s\n", yytext); }
. { /* Ignore all other characters. */ }
%% /*** C Code section ***/
/*
The main program.
Call the lexer. Quit when done. / int main(void) { / yyin is where lex reads from. Set it to the standard input. */ FILE *yyin = stdin;
/* Call the lexer. */
yylex();
return 0;
}