[Python-Dev] Regular expression bytecode

Franklin? Lee leewangzhong+python at gmail.com
Sun Feb 14 14:41:27 EST 2016


I think it would be nice for manipulating (e.g. optimizing, possibly with
JIT-like analysis) and comparing regexes. It can also be useful as a
teaching tool, e.g. exercises in optimizing and comparing regexes.

I think the discussion should be on python-ideas, though.
On Feb 14, 2016 2:01 PM, "Jonathan Goble" <jcgoble3 at gmail.com> wrote:

> I'm new to Python's mailing lists, so please forgive me if I'm sending
> this to the wrong list. :)
>
> I filed http://bugs.python.org/issue26336 a few days ago, but now I
> think this list might be a better place to get discussion going.
> Basically, I'd like to see the bytecode of a compiled regex object
> exposed as a public (probably read-only) attribute of the object.
>
> Currently, although compiled in pure Python through modules
> sre_compile and sre_parse, the list of opcodes is then passed into C
> and copied into an array in a C struct, without being publicly exposed
> in any way. The only way for a user to get an internal representation
> of the regex is the re.DEBUG flag, which only produces an intermediate
> representation rather than the actual bytecode and only goes to
> stdout, which makes it useless for someone who wants to examine it
> programmatically.
>
> I'm sure others can think of other potential use cases for this, but
> one in particular would be that someone could write a debugger that
> can allow a user to step through a regex one opcode at a time to see
> exactly where it is failing. It would also perhaps be nice to have a
> public constructor for the regex object type, which would enable users
> to modify the bytecode and directly create a new regex object from it,
> similar to what is currently possible through the types.FunctionType
> and types.CodeType constructors.
>
> In addition to exposing the code in a public attribute, a helper
> module written in Python similar to the dis module (which is for
> Python's own bytecode) would be very helpful, allowing the code to be
> easily disassembled and examined at a higher level.
>
> Is this a good idea, or am I barking up the wrong tree? I think it's a
> great idea, but I'm open to being told this is a horrible idea. :) I
> welcome any and all comments both here and on the bug tracker.
>
> Jonathan Goble
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/leewangzhong%2Bpython%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160214/422b5ab2/attachment.html>


More information about the Python-Dev mailing list