Issue 1440601: Add col information to parse & ast nodes (original) (raw)

Created on 2006-02-28 21:36 by jpe, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
col-offset.diff jpe,2006-03-01 21:50 Another upload try
Messages (4)
msg49610 - (view) Author: John Ehresman (jpe) * Date: 2006-02-28 21:36
This adds fields to the parser to capture the column where each token starts and each ast node starts (this is defined as the initial token in the ast node). With this it's reasonably easy to extract the text that ast nodes are based on. The patch is incomplete, will probably change a bit, and lacks tests, but I wanted to get feedback on a few questions. * The byte offset of the column position is what is being recorded. I wonder now if the unicode character position should be recorded. This will slow things down somewhat, but the performance loss may not be signficant. * I changed the signature of PyNode_AddChild and PyParse_AddToken. Is this permitted or do new functions need to be created so that the old signatures are preserved. * Where should I put a function that given an ast tree and the source text will add the text that each node is based on? This will be a python function (I'm pretty sure) so it's not easily put in the _ast module. Note that generated files are omitted from the patch.
msg49611 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-02-28 22:39
Logged In: YES user_id=21627 - the byte offset is actually a UTF-8 byte offset. That should be documented, in the grammar, and perhaps elaborated in libast.tex. - changing the signatures is fine; it is unlikely that anybody calls this API, and if they do, the compiler will tell them. - applications of the AST should go into Demo/parser.
msg49612 - (view) Author: John Ehresman (jpe) * Date: 2006-03-01 20:54
Logged In: YES user_id=22785 Updated patch that includes some tests and documentation. The slightly tricky part is the col_offset of an Attribute node -- it was being set to the start of the attribute and after the initial name. Now it points to the start of the initial name. I think we need to wait for some use cases to determine if any more positional information is needed. I suspect some uses may want the positions of each identifier, which is not easily obtainable right now. Includes change to asdl.py to return attributes in the order specified in the .asdl file.
msg49613 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2006-03-02 00:07
Logged In: YES user_id=21627 Thanks for the patch. Committed as 42753
History
Date User Action Args
2022-04-11 14:56:15 admin set github: 42955
2006-02-28 21:36:24 jpe create