[Python-Dev] tokenize string literal problem (original) (raw)

C or L Smith smiles at worksmail.net
Sat Oct 24 08:04:25 CEST 2009

Previous message: [Python-Dev] Summary of Python tracker Issues
Next message: [Python-Dev] tokenize string literal problem
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

BACKGROUND I'm trying to modify the doctest DocTestParser so it will parse docstring code snippets out of a *py file. (Although doctest can parse these with another method out of *pyc, it is missing certain decorated functions and we would also like to insist of import of needed modules rather and that method automatically loads everything from the module containing the code.)

PROBLEM I need to find code snippets which are located in docstrings. Docstrings, being string literals should be able to be parsed out with tokenize. But tokenize is giving the wrong results (or I am doing something wrong) for this (pathological) case:

foo.py: +---- def bar(): """ A quoted triple quote is not a closing of this docstring: >>> print '"""' """ """ # <-- this is the closing quote pass +----

Here is how I tokenize the file:

import re, tokenize DOCSTRING_START_RE = re.compile('\s+[ru]*("""|' + "''')")

o=open('foo.py','r') for ti in tokenize.generate_tokens(o.next): typ = ti[0] text = ti[-1] if typ == tokenize.STRING: if DOCSTRING_START_RE.match(text): print "DOCSTRING:",repr(text) o.close() ###

which outputs:

DOCSTRING: ' """\n A quoted triple quote is not a closing\n of this docstring:\n >>> print '"""'\n' DOCSTRING: ' """\n """ # <-- this is the closing quote\n'

There should be only one string tokenized, I believe. The PythonWin editor parses (and colorizes) this correctly, but tokenize (or I) are making an error.

Thanks for any help, Chris

Previous message: [Python-Dev] Summary of Python tracker Issues
Next message: [Python-Dev] tokenize string literal problem
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list