[Python-ideas] Exposing regular expression bytecode

Wes Turner wes.turner at gmail.com
Mon Feb 15 02:26:02 EST 2016


the new 'regex' module (with a re compatability mode and unicode) may be
the place to find/add more debugging syms

* | PyPI: https://pypi.python.org/pypi/regex
* | Src: https://bitbucket.org/mrabarnett/mrab-regex
On Feb 14, 2016 11:49 PM, "Jonathan Goble" <jcgoble3 at gmail.com> wrote:

> (This was previously sent to python-dev [1], but it was suggested that
> I bring it here first.)
>
> I filed http://bugs.python.org/issue26336 a few days ago, but now I
> think this list might be a better place to get discussion going.
> Basically, I'd like to see the bytecode of a compiled regex object
> exposed as a public (probably read-only) attribute of the object.
>
> Currently, although compiled in pure Python through modules
> sre_compile and sre_parse, the list of opcodes is then passed into C
> and copied into an array in a C struct, without being publicly exposed
> in any way. The only way for a user to get an internal representation
> of the regex is the re.DEBUG flag, which only produces an intermediate
> representation rather than the actual bytecode and only goes to
> stdout, which makes it useless for someone who wants to examine it
> programmatically.
>
> I'm sure others can think of other potential use cases for this, but
> one in particular would be that someone could write a debugger that
> can allow a user to step through a regex one opcode at a time to see
> exactly where it is failing. It would also perhaps be nice to have a
> public constructor for the regex object type, which would enable users
> to modify the bytecode and directly create a new regex object from it,
> similar to what is currently possible for function bytecode through
> the types.FunctionType and types.CodeType constructors. This would
> make possible things such as optimizers.
>
> In addition to exposing the code in a public attribute, a helper
> module written in Python similar to the dis module (which is for
> Python's own bytecode) would be very helpful, allowing the code to be
> easily disassembled and examined at a higher level.
>
> Is this a good idea, or am I barking up the wrong tree? I think it's a
> great idea, but I'm open to being told this is a horrible idea. :) I
> welcome any and all comments both here and on the bug tracker.
>
> [1] https://mail.python.org/pipermail/python-dev/2016-February/143355.html
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160215/842d01b7/attachment.html>


More information about the Python-ideas mailing list