10. Full Grammar specification (original) (raw)

This is the full Python grammar, derived directly from the grammar used to generate the CPython parser (see Grammar/python.gram). The version here omits details related to code generation and error recovery.

The notation used here is the same as in the preceding docs, and is described in the notation section, except for a few extra complications:

&e: a positive lookahead (that is, e is required to match but not consumed)
!e: a negative lookahead (that is, e is required not to match)
~ (“cut”): commit to the current alternative and fail the rule even if this fails to parse

PEG grammar for Python

========================= START OF THE GRAMMAR =========================

General grammatical elements and rules:

* Strings with double quotes (") denote SOFT KEYWORDS

* Strings with single quotes (') denote KEYWORDS

* Upper case names (NAME) denote tokens in the Grammar/Tokens file

* Rule names starting with "invalid_" are used for specialized syntax errors

- These rules are NOT used in the first pass of the parser.

- Only if the first pass fails to parse, a second pass including the invalid

rules will be executed.

- If the parser fails in the second phase with a generic syntax error, the

location of the generic failure of the first pass will be used (this avoids

reporting incorrect locations due to the invalid rules).

- The order of the alternatives involving invalid rules matter

(like any rule in PEG).

Grammar Syntax (see PEP 617 for more information):

rule_name: expression

Optionally, a type can be included right after the rule name, which

specifies the return type of the C or Python function corresponding to the

rule:

rule_name[return_type]: expression

If the return type is omitted, then a void * is returned in C and an Any in

Python.

e1 e2

Match e1, then match e2.

e1 | e2

Match e1 or e2.

The first alternative can also appear on the line after the rule name for

formatting purposes. In that case, a | must be used before the first

alternative, like so:

rule_name[return_type]:

| first_alt

| second_alt

( e )

Match e (allows also to use other operators in the group like '(e)*')

[ e ] or e?

Optionally match e.

e*

Match zero or more occurrences of e.

e+

Match one or more occurrences of e.

s.e+

Match one or more occurrences of e, separated by s. The generated parse tree

does not include the separator. This is otherwise identical to (e (s e)*).

&e

Succeed if e can be parsed, without consuming any input.

!e

Fail if e can be parsed, without consuming any input.

~

Commit to the current alternative, even if it fails to parse.

&&e

Eager parse e. The parser will not backtrack and will immediately

fail with SyntaxError if e cannot be parsed.

STARTING RULES

==============

file: [statements] ENDMARKER interactive: statement_newline eval: expressions NEWLINE* ENDMARKER func_type: '(' [type_expressions] ')' '->' expression NEWLINE* ENDMARKER

GENERAL STATEMENTS

==================

statements: statement+

statement: | compound_stmt | simple_stmts

single_compound_stmt: | compound_stmt

statement_newline: | single_compound_stmt NEWLINE | simple_stmts | NEWLINE | ENDMARKER

simple_stmts: | simple_stmt !';' NEWLINE # Not needed, there for speedup | ';'.simple_stmt+ [';'] NEWLINE

NOTE: assignment MUST precede expression, else parsing a simple assignment

will throw a SyntaxError.

SIMPLE STATEMENTS

=================

NOTE: annotated_rhs may start with 'yield'; yield_expr must start with 'yield'

annotated_rhs: yield_expr | star_expressions

augassign: | '+=' | '-=' | '*=' | '@=' | '/=' | '%=' | '&=' | '|=' | '^=' | '<<=' | '>>=' | '**=' | '//='

return_stmt: | 'return' [star_expressions]

raise_stmt: | 'raise' expression ['from' expression ] | 'raise'

pass_stmt: | 'pass'

break_stmt: | 'break'

continue_stmt: | 'continue'

global_stmt: 'global' ','.NAME+

nonlocal_stmt: 'nonlocal' ','.NAME+

del_stmt: | 'del' del_targets &(';' | NEWLINE)

yield_stmt: yield_expr

assert_stmt: 'assert' expression [',' expression ]

import_stmt: | import_name | import_from

Import statements

-----------------

import_name: 'import' dotted_as_names

note below: the ('.' | '...') is necessary because '...' is tokenized as ELLIPSIS

dotted_as_names: | ','.dotted_as_name+ dotted_as_name: | dotted_name ['as' NAME ]

dotted_name: | dotted_name '.' NAME | NAME

COMPOUND STATEMENTS

===================

Common elements

---------------

block: | NEWLINE INDENT statements DEDENT | simple_stmts

decorators: ('@' named_expression NEWLINE )+

Class definitions

-----------------

class_def: | decorators class_def_raw | class_def_raw

class_def_raw: | 'class' NAME [type_params] ['(' [arguments] ')' ] ':' block

Function definitions

--------------------

function_def: | decorators function_def_raw | function_def_raw

function_def_raw: | 'def' NAME [type_params] '(' [params] ')' ['->' expression ] ':' [func_type_comment] block | 'async' 'def' NAME [type_params] '(' [params] ')' ['->' expression ] ':' [func_type_comment] block

Function parameters

-------------------

params: | parameters

Some duplication here because we can't write (',' | &')'),

which is because we don't support empty alternatives (yet).

slash_no_default: | param_no_default+ '/' ',' | param_no_default+ '/' &')' slash_with_default: | param_no_default* param_with_default+ '/' ',' | param_no_default* param_with_default+ '/' &')'

star_etc: | '' param_no_default param_maybe_default [kwds] | '' param_no_default_star_annotation param_maybe_default [kwds] | '*' ',' param_maybe_default+ [kwds] | kwds

kwds: | '**' param_no_default

One parameter. This includes a following comma and type comment.

There are three styles:

- No default

- With default

- Maybe with default

There are two alternative forms of each, to deal with type comments:

- Ends in a comma followed by an optional type comment

- No comma, optional type comment, must be followed by close paren

The latter form is for a final parameter without trailing comma.

param_no_default: | param ',' TYPE_COMMENT? | param TYPE_COMMENT? &')' param_no_default_star_annotation: | param_star_annotation ',' TYPE_COMMENT? | param_star_annotation TYPE_COMMENT? &')' param_with_default: | param default ',' TYPE_COMMENT? | param default TYPE_COMMENT? &')' param_maybe_default: | param default? ',' TYPE_COMMENT? | param default? TYPE_COMMENT? &')' param: NAME annotation? param_star_annotation: NAME star_annotation annotation: ':' expression star_annotation: ':' star_expression default: '=' expression | invalid_default

If statement

------------

While statement

---------------

while_stmt: | 'while' named_expression ':' block [else_block]

For statement

-------------

for_stmt: | 'for' star_targets 'in' ~ star_expressions ':' [TYPE_COMMENT] block [else_block] | 'async' 'for' star_targets 'in' ~ star_expressions ':' [TYPE_COMMENT] block [else_block]

With statement

--------------

with_stmt: | 'with' '(' ','.with_item+ ','? ')' ':' [TYPE_COMMENT] block | 'with' ','.with_item+ ':' [TYPE_COMMENT] block | 'async' 'with' '(' ','.with_item+ ','? ')' ':' block | 'async' 'with' ','.with_item+ ':' [TYPE_COMMENT] block

with_item: | expression 'as' star_target &(',' | ')' | ':') | expression

Try statement

-------------

try_stmt: | 'try' ':' block finally_block | 'try' ':' block except_block+ [else_block] [finally_block] | 'try' ':' block except_star_block+ [else_block] [finally_block]

Except statement

----------------

Match statement

---------------

match_stmt: | "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT

subject_expr: | star_named_expression ',' star_named_expressions? | named_expression

case_block: | "case" patterns guard? ':' block

guard: 'if' named_expression

patterns: | open_sequence_pattern | pattern

pattern: | as_pattern | or_pattern

as_pattern: | or_pattern 'as' pattern_capture_target

or_pattern: | '|'.closed_pattern+

Literal patterns are used for equality and identity constraints

Literal expressions are used to restrict permitted mapping pattern keys

complex_number: | signed_real_number '+' imaginary_number | signed_real_number '-' imaginary_number

signed_number: | NUMBER | '-' NUMBER

signed_real_number: | real_number | '-' real_number

real_number: | NUMBER

imaginary_number: | NUMBER

capture_pattern: | pattern_capture_target

pattern_capture_target: | !"_" NAME !('.' | '(' | '=')

wildcard_pattern: | "_"

value_pattern: | attr !('.' | '(' | '=')

attr: | name_or_attr '.' NAME

name_or_attr: | attr | NAME

group_pattern: | '(' pattern ')'

sequence_pattern: | '[' maybe_sequence_pattern? ']' | '(' open_sequence_pattern? ')'

open_sequence_pattern: | maybe_star_pattern ',' maybe_sequence_pattern?

maybe_sequence_pattern: | ','.maybe_star_pattern+ ','?

maybe_star_pattern: | star_pattern | pattern

star_pattern: | '' pattern_capture_target | '' wildcard_pattern

mapping_pattern: | '{' '}' | '{' double_star_pattern ','? '}' | '{' items_pattern ',' double_star_pattern ','? '}' | '{' items_pattern ','? '}'

items_pattern: | ','.key_value_pattern+

key_value_pattern: | (literal_expr | attr) ':' pattern

double_star_pattern: | '**' pattern_capture_target

class_pattern: | name_or_attr '(' ')' | name_or_attr '(' positional_patterns ','? ')' | name_or_attr '(' keyword_patterns ','? ')' | name_or_attr '(' positional_patterns ',' keyword_patterns ','? ')'

positional_patterns: | ','.pattern+

keyword_patterns: | ','.keyword_pattern+

keyword_pattern: | NAME '=' pattern

Type statement

---------------

type_alias: | "type" NAME [type_params] '=' expression

Type parameter declaration

--------------------------

type_params: | '[' type_param_seq ']'

type_param_seq: ','.type_param+ [',']

type_param: | NAME [type_param_bound] [type_param_default] | '*' NAME [type_param_starred_default] | '**' NAME [type_param_default]

type_param_bound: ':' expression type_param_default: '=' expression type_param_starred_default: '=' star_expression

EXPRESSIONS

-----------

expressions: | expression (',' expression )+ [','] | expression ',' | expression

expression: | disjunction 'if' disjunction 'else' expression | disjunction | lambdef

yield_expr: | 'yield' 'from' expression | 'yield' [star_expressions]

star_expressions: | star_expression (',' star_expression )+ [','] | star_expression ',' | star_expression

star_expression: | '*' bitwise_or | expression

star_named_expressions: ','.star_named_expression+ [',']

star_named_expression: | '*' bitwise_or | named_expression

assignment_expression: | NAME ':=' ~ expression

named_expression: | assignment_expression | expression !':='

disjunction: | conjunction ('or' conjunction )+ | conjunction

conjunction: | inversion ('and' inversion )+ | inversion

inversion: | 'not' inversion | comparison

Comparison operators

--------------------

comparison: | bitwise_or compare_op_bitwise_or_pair+ | bitwise_or

eq_bitwise_or: '==' bitwise_or noteq_bitwise_or: | ('!=' ) bitwise_or lte_bitwise_or: '<=' bitwise_or lt_bitwise_or: '<' bitwise_or gte_bitwise_or: '>=' bitwise_or gt_bitwise_or: '>' bitwise_or notin_bitwise_or: 'not' 'in' bitwise_or in_bitwise_or: 'in' bitwise_or isnot_bitwise_or: 'is' 'not' bitwise_or is_bitwise_or: 'is' bitwise_or

Bitwise operators

-----------------

bitwise_or: | bitwise_or '|' bitwise_xor | bitwise_xor

bitwise_xor: | bitwise_xor '^' bitwise_and | bitwise_and

bitwise_and: | bitwise_and '&' shift_expr | shift_expr

shift_expr: | shift_expr '<<' sum | shift_expr '>>' sum | sum

Arithmetic operators

--------------------

sum: | sum '+' term | sum '-' term | term

factor: | '+' factor | '-' factor | '~' factor | power

power: | await_primary '**' factor | await_primary

Primary elements

----------------

Primary elements are things like "obj.something.something", "obj[something]", "obj(something)", "obj" ...

await_primary: | 'await' primary | primary

slices: | slice !',' | ','.(slice | starred_expression)+ [',']

slice: | [expression] ':' [expression] [':' [expression] ] | named_expression

group: | '(' (yield_expr | named_expression) ')'

Lambda functions

----------------

lambdef: | 'lambda' [lambda_params] ':' expression

lambda_params: | lambda_parameters

lambda_parameters etc. duplicates parameters but without annotations

or type comments, and if there's no comma after a parameter, we expect

a colon, not a close parenthesis. (For more, see parameters above.)

lambda_parameters: | lambda_slash_no_default lambda_param_no_default* lambda_param_with_default* [lambda_star_etc] | lambda_slash_with_default lambda_param_with_default* [lambda_star_etc] | lambda_param_no_default+ lambda_param_with_default* [lambda_star_etc] | lambda_param_with_default+ [lambda_star_etc] | lambda_star_etc

lambda_slash_no_default: | lambda_param_no_default+ '/' ',' | lambda_param_no_default+ '/' &':'

lambda_slash_with_default: | lambda_param_no_default* lambda_param_with_default+ '/' ',' | lambda_param_no_default* lambda_param_with_default+ '/' &':'

lambda_star_etc: | '' lambda_param_no_default lambda_param_maybe_default [lambda_kwds] | '*' ',' lambda_param_maybe_default+ [lambda_kwds] | lambda_kwds

lambda_kwds: | '**' lambda_param_no_default

LITERALS

========

fstring_middle: | fstring_replacement_field | FSTRING_MIDDLE fstring_replacement_field: | '{' annotated_rhs '='? [fstring_conversion] [fstring_full_format_spec] '}' fstring_conversion: | "!" NAME fstring_full_format_spec: | ':' fstring_format_spec* fstring_format_spec: | FSTRING_MIDDLE | fstring_replacement_field fstring: | FSTRING_START fstring_middle* FSTRING_END

tstring_format_spec_replacement_field: | '{' annotated_rhs '='? [fstring_conversion] [tstring_full_format_spec] '}' tstring_format_spec: | TSTRING_MIDDLE | tstring_format_spec_replacement_field tstring_full_format_spec: | ':' tstring_format_spec* tstring_replacement_field: | '{' annotated_rhs '='? [fstring_conversion] [tstring_full_format_spec] '}' tstring_middle: | tstring_replacement_field | TSTRING_MIDDLE tstring: | TSTRING_START tstring_middle* TSTRING_END

string: STRING strings: (fstring|string|tstring)+

list: | '[' [star_named_expressions] ']'

tuple: | '(' [star_named_expression ',' [star_named_expressions] ] ')'

set: '{' star_named_expressions '}'

Dicts

-----

dict: | '{' [double_starred_kvpairs] '}'

double_starred_kvpairs: ','.double_starred_kvpair+ [',']

double_starred_kvpair: | '**' bitwise_or | kvpair

kvpair: expression ':' expression

Comprehensions & Generators

---------------------------

for_if_clauses: | for_if_clause+

for_if_clause: | 'async' 'for' star_targets 'in' ~ disjunction ('if' disjunction )* | 'for' star_targets 'in' ~ disjunction ('if' disjunction )*

listcomp: | '[' named_expression for_if_clauses ']'

setcomp: | '{' named_expression for_if_clauses '}'

genexp: | '(' ( assignment_expression | expression !':=') for_if_clauses ')'

dictcomp: | '{' kvpair for_if_clauses '}'

FUNCTION CALL ARGUMENTS

=======================

arguments: | args [','] &')'

args: | ','.(starred_expression | ( assignment_expression | expression !':=') !'=')+ [',' kwargs ] | kwargs

kwargs: | ','.kwarg_or_starred+ ',' ','.kwarg_or_double_starred+ | ','.kwarg_or_starred+ | ','.kwarg_or_double_starred+

starred_expression: | '*' expression

kwarg_or_starred: | NAME '=' expression | starred_expression

kwarg_or_double_starred: | NAME '=' expression | '**' expression

ASSIGNMENT TARGETS

==================

Generic targets

---------------

NOTE: star_targets may contain *bitwise_or, targets may not.

star_targets: | star_target !',' | star_target (',' star_target )* [',']

star_targets_list_seq: ','.star_target+ [',']

star_targets_tuple_seq: | star_target (',' star_target )+ [','] | star_target ','

star_target: | '' (!'' star_target) | target_with_star_atom

target_with_star_atom: | t_primary '.' NAME !t_lookahead | t_primary '[' slices ']' !t_lookahead | star_atom

star_atom: | NAME | '(' target_with_star_atom ')' | '(' [star_targets_tuple_seq] ')' | '[' [star_targets_list_seq] ']'

single_target: | single_subscript_attribute_target | NAME | '(' single_target ')'

single_subscript_attribute_target: | t_primary '.' NAME !t_lookahead | t_primary '[' slices ']' !t_lookahead

t_lookahead: '(' | '[' | '.'

Targets for del statements

--------------------------

del_targets: ','.del_target+ [',']

del_target: | t_primary '.' NAME !t_lookahead | t_primary '[' slices ']' !t_lookahead | del_t_atom

del_t_atom: | NAME | '(' del_target ')' | '(' [del_targets] ')' | '[' [del_targets] ']'

TYPING ELEMENTS

---------------

type_expressions allow /* but ignore them

func_type_comment: | NEWLINE TYPE_COMMENT &(NEWLINE INDENT) # Must be followed by indented block | TYPE_COMMENT

========================= END OF THE GRAMMAR ===========================

========================= START OF INVALID RULES =======================