[Python-Dev] Regular expression bytecode (original) (raw)
Franklin? Lee leewangzhong+python at gmail.com
Sun Feb 14 14:41:27 EST 2016
- Previous message (by thread): [Python-Dev] Regular expression bytecode
- Next message (by thread): [Python-Dev] Regular expression bytecode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I think it would be nice for manipulating (e.g. optimizing, possibly with JIT-like analysis) and comparing regexes. It can also be useful as a teaching tool, e.g. exercises in optimizing and comparing regexes.
I think the discussion should be on python-ideas, though. On Feb 14, 2016 2:01 PM, "Jonathan Goble" <jcgoble3 at gmail.com> wrote:
I'm new to Python's mailing lists, so please forgive me if I'm sending this to the wrong list. :)
I filed http://bugs.python.org/issue26336 a few days ago, but now I think this list might be a better place to get discussion going. Basically, I'd like to see the bytecode of a compiled regex object exposed as a public (probably read-only) attribute of the object. Currently, although compiled in pure Python through modules srecompile and sreparse, the list of opcodes is then passed into C and copied into an array in a C struct, without being publicly exposed in any way. The only way for a user to get an internal representation of the regex is the re.DEBUG flag, which only produces an intermediate representation rather than the actual bytecode and only goes to stdout, which makes it useless for someone who wants to examine it programmatically. I'm sure others can think of other potential use cases for this, but one in particular would be that someone could write a debugger that can allow a user to step through a regex one opcode at a time to see exactly where it is failing. It would also perhaps be nice to have a public constructor for the regex object type, which would enable users to modify the bytecode and directly create a new regex object from it, similar to what is currently possible through the types.FunctionType and types.CodeType constructors. In addition to exposing the code in a public attribute, a helper module written in Python similar to the dis module (which is for Python's own bytecode) would be very helpful, allowing the code to be easily disassembled and examined at a higher level. Is this a good idea, or am I barking up the wrong tree? I think it's a great idea, but I'm open to being told this is a horrible idea. :) I welcome any and all comments both here and on the bug tracker. Jonathan Goble
Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20160214/422b5ab2/attachment.html>
- Previous message (by thread): [Python-Dev] Regular expression bytecode
- Next message (by thread): [Python-Dev] Regular expression bytecode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]