[pypy-issue] [issue340] rewrite PyPy's tokenizer

Tue Mar 19 21:08:08 CET 2013

Buck Golemon <buck.golemon at gmail.com> added the comment:

Just recording this here for everyone's information.

On Mon, Mar 18, 2013 at 10:42 PM, Benjamin Peterson <benjamin at python.org> wrote:
Hi Buck,
I wanted to say a bit more about what a better PyPy tokenizer would
look like. If you look in pypy/interpreter/pyparser/pytokenizer.py,
you'll see the main rountine is generate_tokens(). I think that
routine is fine overall. You'll notice it uses things like "endDFA"
and "whiteSpaceDFA" for matching. Looking under the layers, you'll see
this is basically a bunch of automatically generated icky DFAs. That
would be a excellent place for the regular expressions of rply.

----------
nosy: +buck

________________________________________
PyPy bug tracker <tracker at bugs.pypy.org>
<https://bugs.pypy.org/issue340>
________________________________________