ka-ping yee tokenizer.py

Aaron "Castironpi" Brady castironpi at gmail.com
Tue Sep 16 21:39:01 EDT 2008


On Sep 16, 2:48 pm, "Karl Kobata" <karl.kob... at syncira.com> wrote:
> Hi Fredrik,
>
> This is exactly what I need.  Thank you.
> I would like to do one additional function.  I am not using the tokenizer to
> parse python code.  It happens to work very well for my application.
> However, I would like either or both of the following variance:
> 1) I would like to add 2 other characters as comment designation
> 2) write a module that can readline, modify the line as required, and
> finally, this module can be used as the argument for the tokenizer.
>
> Def modifyLine( fileHandle ):
>   # readline and modify this string if required
> ...
>
> For token in tokenize.generate_tokens( modifyLine( myFileHandle ) ):
>         Print token
>
> Anxiously looking forward to your thoughts.
> karl
>
> -----Original Message-----
> From: python-list-bounces+kkobata=syncira.... at python.org
>
> [mailto:python-list-bounces+kkobata=syncira.... at python.org] On Behalf Of
> Fredrik Lundh
> Sent: Monday, September 15, 2008 2:04 PM
> To: python-l... at python.org
> Subject: Re: ka-ping yee tokenizer.py
>
> Karl Kobata wrote:
>
> > I have enjoyed using ka-ping yee's tokenizer.py.  I would like to
> > replace the readline parameter input with my own and pass a list of
> > strings to the tokenizer.  I understand it must be a callable object and
> > iteratable but it is obvious with errors I am getting, that this is not
> > the only functions required.
>
> not sure I can decipher your detailed requirements, but to use Python's
> standard "tokenize" module (written by ping) on a list, you can simple
> do as follows:
>
>      import tokenize
>
>      program = [ ... program given as list ... ]
>
>      for token in tokenize.generate_tokens(iter(program).next):
>          print token
>
> another approach is to turn the list back into a string, and wrap that
> in a StringIO object:
>
>      import tokenize
>      import StringIO
>
>      program = [ ... program given as list ... ]
>
>      program_buffer = StringIO.StringIO("".join(program))
>
>      for token in tokenize.generate_tokens(program_buffer.readline):
>          print token
>
> </F>
>
> --http://mail.python.org/mailman/listinfo/python-list
>
>

This is an interesting construction:

>>> a= [ 'a', 'b', 'c' ]
>>> def moditer( mod, nextfun ):
...     while 1:
...             yield mod( nextfun( ) )
...
>>> list( moditer( ord, iter( a ).next ) )
[97, 98, 99]

Here's my point:

>>> a= [ 'print a', 'print b', 'print c' ]
>>> tokenize.generate_tokens( iter( a ).next )
<generator object at 0x009FF440>
>>> tokenize.generate_tokens( moditer( lambda s: s+ '#', iter( a ).next ).next )

It adds a '#' to the end of every line, then tokenizes.



More information about the Python-list mailing list