Refactor a buffered class...

Michael Spencer mahs at telcopartners.com
Wed Sep 6 17:35:03 EDT 2006


lh84777 at yahoo.fr wrote:
> actually for the example i have used only one sentry condition by they
> are more numerous and complex, also i need to work on a huge amount on
> data (each word are a line with many features readed from a file)
An open (text) file is a line-based iterator that can be fed directly to 
'chunker'.  As for different sentry conditions, I imagine they can be coded in 
either model. How much is a 'huge amount' of data?

> oops
> 
>> to have:
>>
>> this .
>> this . is a .
>> this . is a . test to .
>> is a . test to . check if it .
>> test to . check if it . works .
>> check if it . works . well .
>> works . well . it looks like .
> well . it looks like .
> it looks like .
> 
Here's a small update to the generator that allows optional handling of the head 
and the tail:

def chunker(s, chunk_size=3, sentry=".", keep_first = False, keep_last = False):
     buffer=[]
     sentry_count = 0

     for item in s:
         buffer.append(item)
         if item == sentry:
             sentry_count += 1
             if sentry_count < chunk_size:
                 if keep_first:
                     yield buffer
             else:
                 yield buffer
                 del buffer[:buffer.index(sentry)+1]

     if keep_last:
         while buffer:
             yield buffer
             del buffer[:buffer.index(sentry)+1]


 >>> for p in chunker(s.split(), keep_first = True, keep_last=True): print " 
".join(p)
...
this .
this . is a .
this . is a . test to .
is a . test to . check if it .
test to . check if it . works .
check if it . works . well .
works . well . it looks like .
well . it looks like .
it looks like .
 >>>




More information about the Python-list mailing list