[Python-Dev] Reading Python source file (original) (raw)

M.-A. Lemburg mal at egenix.com
Tue Nov 17 04:59:06 EST 2015


On 17.11.2015 02:53, Serhiy Storchaka wrote:

I'm working on rewriting Python tokenizer (in particular the part that reads and decodes Python source file). The code is complicated. For now there are such cases:

* Reading from the string in memory. * Interactive reading from the file. * Reading from the file: - Raw reading ignoring encoding in parser generator. - Raw reading UTF-8 encoded file. - Reading and recoding to UTF-8. The file is read by the line. It makes hard to check correctness of the first line if the encoding is specified in the second line. And it makes very hard problems with null bytes and with desynchronizing buffered C and Python files. All this problems can be easily solved if read all Python source file in memory and then parse it as string. This would allow to drop a large complex and buggy part of code. Are there disadvantages in this solution? As for memory consumption, the source text itself will consume only small part of the memory consumed by AST tree and other structures. As for performance, reading and decoding all file can be faster then by the line.

A problem with this approach is that you can no longer fail early and detect indentation errors et al. while parsing the data (which may well come from a pipe).

Another related problem is that you have to wait for the full input data before you can start compiling the code.

I don't think these situations are all that common, though, so reading in the full source code before compiling it sounds like a reasonable approach.

We use the same simplification in eGenix PyRun's emulation of the Python command line interface and it has so far not caused any problems.

[1] http://bugs.python.org/issue25643

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Experts (#1, Nov 17 2015)

Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/


2015-10-23: Released mxODBC Connect 2.1.5 ... http://egenix.com/go85

::: We implement business ideas - efficiently in both time and costs :::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/



More information about the Python-Dev mailing list