[Python-Dev] Simplify lnotab? (AST branch update) (original) (raw)

Phillip J. Eby pje at telecommunity.com
Fri Oct 14 03:55:20 CEST 2005


At 02:25 PM 10/14/2005 +1300, Greg Ewing wrote:

Phillip J. Eby wrote:

> +1. I'd be especially interested in lifting the current requirement > that line ranges and byte ranges both increase monotonically. Even > better if the lines for a particular piece of code don't have to all > come from the same file. How about an array of: +----------------+----------------+----------------+ | bytecode index | file no. | line no. | +----------------+----------------+----------------+ Entries are sorted by bytecode index, with each entry applying from that bytecode position up to the position of the next entry. The file no. indexes a tuple of file names attached to the code object. All entries are 32-bit integers.

The file number could be 16-bit - I don't see a use case for referring to 65,000 different filenames. ;) But that doesn't save much space.

Anyway, in the common case, this scheme will use 10 more bytes per line of Python code, which translates to a megabyte or so for the standard library. I definitely like the simplicity, but a meg's a meg. A more compact scheme is possible, by using two tables - a bytecode->line number table, and a line number-> file table. In the single-file case, you can omit the second table, and the first table then only uses 6 more bytes per line than we're currently using. Not fantastic, but probably more acceptable.

If you have to encode multiple files, you just offset their line numbers by the size of the other files, and put entries in the line->file table to match. When computing the line number, you subtract the matching entry in the line->file table to get the actual line number within that file.



More information about the Python-Dev mailing list