bpo-33338: [lib2to3] Synchronize token.py and tokenize.py with stdlib by ambv · Pull Request #6572 · python/cpython (original) (raw)

(This is Step 1 in BPO-33337. See there for larger context.)

lib2to3's token.py and tokenize.py were initially copies of the respective
files from the standard library. They were copied to allow Python 3 to read
Python 2's grammar.

Since 2006, lib2to3 grew to be widely used as a Concrete Syntax Tree, also for
parsing Python 3 code. Additions to support Python 3 grammar were added but
sadly, the main token.py and tokenize.py diverged.

This change brings them back together, minimizing the differences to the bare
minimum that is in fact required by lib2to3. Before this change, almost every
line in lib2to3/pgen2/tokenize.py was different from tokenize.py. After this
change, the diff between the two files is only 200 lines long and is entirely
filled with relevant Python 2 compatibility bits.

Merging the implementations, there's numerous fixes to the lib2to3 tokenizer:

Improvements to token.py:

https://bugs.python.org/issue33338