ka-ping yee tokenizer.py

Karl Kobata karl.kobata at syncira.com
Wed Sep 17 14:02:06 EDT 2008


Aaran,
 
Thanks for your input.  Your examples gave me other alternatives for what I
wanted to do and it seems to work.
 
Thanks all for your help.
 
 
On Sep 16, 2:48 pm, "Karl Kobata" <karl.kob... at syncira.com
<http://mail.python.org/mailman/listinfo/python-list> > wrote:
> Hi Fredrik,
> 
> This is exactly what I need.  Thank you.
> I would like to do one additional function.  I am not using the tokenizer
to
> parse python code.  It happens to work very well for my application.
> However, I would like either or both of the following variance:
> 1) I would like to add 2 other characters as comment designation
> 2) write a module that can readline, modify the line as required, and
> finally, this module can be used as the argument for the tokenizer.
> 
> Def modifyLine( fileHandle ):
>   # readline and modify this string if required
> ...
> 
> For token in tokenize.generate_tokens( modifyLine( myFileHandle ) ):
>         Print token
> 
> Anxiously looking forward to your thoughts.
> karl
> 
> -----Original Message-----
> From: python-list-bounces+kkobata=syncira.... at python.org
<http://mail.python.org/mailman/listinfo/python-list> 
> 
> [mailto:python-list-bounces+kkobata=syncira.... at python.org
<http://mail.python.org/mailman/listinfo/python-list> ] On Behalf Of
> Fredrik Lundh
> Sent: Monday, September 15, 2008 2:04 PM
> To: python-l... at python.org
<http://mail.python.org/mailman/listinfo/python-list> 
> Subject: Re: ka-ping yee tokenizer.py
> 
> Karl Kobata wrote:
> 
> > I have enjoyed using ka-ping yee's tokenizer.py.  I would like to
> > replace the readline parameter input with my own and pass a list of
> > strings to the tokenizer.  I understand it must be a callable object and
> > iteratable but it is obvious with errors I am getting, that this is not
> > the only functions required.
> 
> not sure I can decipher your detailed requirements, but to use Python's
> standard "tokenize" module (written by ping) on a list, you can simple
> do as follows:
> 
>      import tokenize
> 
>      program = [ ... program given as list ... ]
> 
>      for token in tokenize.generate_tokens(iter(program).next):
>          print token
> 
> another approach is to turn the list back into a string, and wrap that
> in a StringIO object:
> 
>      import tokenize
>      import StringIO
> 
>      program = [ ... program given as list ... ]
> 
>      program_buffer = StringIO.StringIO("".join(program))
> 
>      for token in tokenize.generate_tokens(program_buffer.readline):
>          print token
> 
> </F>
> 
> --http://mail.python.org/mailman/listinfo/python-list
> 
> 
 
This is an interesting construction:
 
>>> a= [ 'a', 'b', 'c' ]
>>> def moditer( mod, nextfun ):
...     while 1:
...             yield mod( nextfun( ) )
...
>>> list( moditer( ord, iter( a ).next ) )
[97, 98, 99]
 
Here's my point:
 
>>> a= [ 'print a', 'print b', 'print c' ]
>>> tokenize.generate_tokens( iter( a ).next )
<generator object at 0x009FF440>
>>> tokenize.generate_tokens( moditer( lambda s: s+ '#', iter( a ).next
).next )
 
It adds a '#' to the end of every line, then tokenizes.

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080917/625fe4b1/attachment-0001.html>


More information about the Python-list mailing list