Best practice for operations on streams of text

Gary Herron gherron at islandtraining.com
Thu May 7 16:23:57 EDT 2009


James wrote:
> Hello all,
> I'm working on some NLP code - what I'm doing is passing a large
> number of tokens through a number of filtering / processing steps.
>
> The filters take a token as input, and may or may not yield a token as
> a result. For example, I might have filters which lowercases the
> input, filter out boring words and filter out duplicates chained
> together.
>
> I originally had code like this:
> for t0 in token_stream:
>   for t1 in lowercase_token(t0):
>     for t2 in remove_boring(t1):
>       for t3 in remove_dupes(t2):
>         yield t3
>
> Apart from being ugly as sin, I only get one token out as
> StopIteration is raised before the whole token stream is consumed.
>
> Any suggestions on an elegant way to chain together a bunch of
> generators, with processing steps in between?
>
> Thanks,
> James
> --
> http://mail.python.org/mailman/listinfo/python-list
>   

David Beazly has a very interesting talk on using generators for 
building and linking together individual stream filters.  Its very cool 
and surprisingly eye-opening.

See "Generator Tricks for Systems Programmers" at  
http://www.dabeaz.com/generators/

Gary Herron





More information about the Python-list mailing list