bpo-35808: Retire pgen and use pgen2 to generate the parser by pablogsal · Pull Request #11814 · python/cpython (original) (raw)

OK I will try to play with this myself.

On Thu, Feb 21, 2019 at 9:36 AM Pablo Galindo ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In Parser/pgen/pgen.py <#11814 (comment)>: > + +# Use Lib/token.py and Lib/tokenize.py to obtain the tokens. To maintain this +# compatible with older versions of Python, we need to make sure that we only +# import these two files (and not any of the dependencies of these files). + +CURRENT_FOLDER_LOCATION = os.path.dirname(os.path.realpath(__file__)) +LIB_LOCATION = os.path.realpath(os.path.join(CURRENT_FOLDER_LOCATION, '..', '..', 'Lib')) +TOKEN_LOCATION = os.path.join(LIB_LOCATION, 'token.py') +TOKENIZE_LOCATION = os.path.join(LIB_LOCATION, 'tokenize.py') + +token = importlib.machinery.SourceFileLoader('token', + TOKEN_LOCATION).load_module() +# Add token to the module cache so tokenize.py uses that excact one instead of +# the one in the stdlib of the interpreter executing this file. +sys.modules['token'] = token +tokenize = importlib.machinery.SourceFileLoader('tokenize', The tokenize module from Python 2.4 can handle this. :) Why do we need to use the latest tokenize to parse the Grammar file? Is not that the tokenize cannot handle the grammar is that if the tokenizer uses different values for the tokens, it fails when constructing the dfas when calling self.parse(): Traceback (most recent call last): File "/usr/lib/python3.4/runpy.py", line 170, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.4/runpy.py", line 85, in _run_code exec(code, run_globals) File "/src/Parser/pgen/__main__.py", line 36, in main() File "/src/Parser/pgen/__main__.py", line 29, in main p = ParserGenerator(args.grammar, token_lines, verbose=args.verbose) File "/src/Parser/pgen/pgen.py", line 20, in __init__ self.dfas, self.startsymbol = self.parse() File "/src/Parser/pgen/pgen.py", line 173, in parse self.expect(self.tokens['OP'], ":") File "/src/Parser/pgen/pgen.py", line 337, in expect type, value, self.type, self.value) File "/src/Parser/pgen/pgen.py", line 356, in raise_error self.end[1], self.line)) File "./Grammar/Grammar", line 13 single_input: NEWLINE | simple_stmt | compound_stmt NEWLINE ^ SyntaxError: expected 54/:, got 52/: This is because OP has the value of 52 in Python3.5 (in this example) and 54 in the tokens that we construct from Grammar/Tokens (or in Lib/tokens.py). This difference is because the value of 52 is yielded by the tokenize (from Python3.5) when calling next(self.generator) in gettoken. Maybe I am missing something here, but that is the problem I found when triying to use the tokenize from the running Python :( — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#11814 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACwrMsvafFsHzCCrxp4OM99kzKGQCCOWks5vPtkKgaJpZM4azYg1> .

-- --Guido (mobile)