[Python-Dev] Small tweak to tokenize.py? (original) (raw)
Guido van Rossum guido at python.org
Sat Dec 2 19:06:50 CET 2006
- Previous message: [Python-Dev] Small tweak to tokenize.py?
- Next message: [Python-Dev] a feature i'd like to see in python #1: better iteration control
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 12/2/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
Guido van Rossum wrote:
>> it would be a good thing if it could, optionally, be made to report >> horizontal whitespace as well. > > It's remarkably easy to get this out of the existing API sure, but it would be even easier if I didn't have to write that code myself (last time I did that, I needed a couple of tries before the parser handled all cases correctly...). but maybe this could simply be handled by a helper generator in the tokenizer module, that simply wraps the standard tokenizer generator and inserts whitespace tokens where necessary?
A helper sounds like a promising idea. Anyone interested in volunteering a patch?
> keep track > of the end position returned by the previous call, and if it's > different from the start position returned by the next call, slice the > line text from the column positions, assuming the line numbers are the > same.If the line numbers differ, something has been eating \n tokens; > this shouldn't happen any more with my patch.
you'll still have to deal with multiline strings, right?
No, they are returned as a single token whose start and stop correctly reflect line/col of the begin and end of the token. My current code (based on the second patch I gave in this thread and the algorithm described above) doesn't have to special-case anything except the ENDMARKER token (to break out of its loop :-).
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] Small tweak to tokenize.py?
- Next message: [Python-Dev] a feature i'd like to see in python #1: better iteration control
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]