Extending sqlparse — python-sqlparse 0.5.4.dev0 documentation (original) (raw)

The sqlparse module uses a sql grammar that was tuned through usage and numerous PR to fit a broad range of SQL syntaxes, but it cannot cater to every given case since some SQL dialects have adopted conflicting meanings of certain keywords. Sqlparse therefore exposes a mechanism to configure the fundamental keywords and regular expressions that parse the language as described below.

If you find an adaptation that works for your specific use-case. Please consider contributing it back to the community by opening a PR onGitHub.

Configuring the Lexer

The lexer is a singleton class that breaks down the stream of characters into language tokens. It does this by using a sequence of regular expressions and keywords that are listed in the file sqlparse.keywords. Instead of applying these fixed grammar definitions directly, the lexer is default initialized in its method calleddefault_initialization(). As an api user, you can adapt the Lexer configuration by applying your own configuration logic. To do so, start out by clearing previous configurations with .clear(), then apply the SQL list with.set_SQL_REGEX(SQL_REGEX), and apply keyword lists with .add_keywords(KEYWORDS).

You can do so by re-using the expressions in sqlparse.keywords (see example below), leaving parts out, or by making up your own master list.

See the expected types of the arguments by inspecting their structure insqlparse.keywords. (For compatibility with python 3.4, this library does not use type-hints.)

The following example adds support for the expression ZORDER BY, and adds BAR as a keyword to the lexer:

import re

import sqlparse from sqlparse import keywords from sqlparse.lexer import Lexer

get the lexer singleton object to configure it

lex = Lexer.get_default_instance()

Clear the default configurations.

After this call, reg-exps and keyword dictionaries need to be loaded

to make the lexer functional again.

lex.clear()

my_regex = (r"ZORDER\s+BY\b", sqlparse.tokens.Keyword)

slice the default SQL_REGEX to inject the custom object

lex.set_SQL_REGEX( keywords.SQL_REGEX[:38] + [my_regex] + keywords.SQL_REGEX[38:] )

add the default keyword dictionaries

lex.add_keywords(keywords.KEYWORDS_COMMON) lex.add_keywords(keywords.KEYWORDS_ORACLE) lex.add_keywords(keywords.KEYWORDS_PLPGSQL) lex.add_keywords(keywords.KEYWORDS_HQL) lex.add_keywords(keywords.KEYWORDS_MSACCESS) lex.add_keywords(keywords.KEYWORDS)

add a custom keyword dictionary

lex.add_keywords({'BAR': sqlparse.tokens.Keyword})

no configuration is passed here. The lexer is used as a singleton.

sqlparse.parse("select * from foo zorder by bar;")