Identifying the start of good data in a list

Gerard flanagan grflanagan at gmail.com
Thu Aug 28 11:47:42 EDT 2008


George Sakkis wrote:
> On Aug 27, 3:00 pm, Gerard flanagan <grflana... at gmail.com> wrote:
> 
>> tkp... at hotmail.com wrote:
>>> I have a list that starts with zeros, has sporadic data, and then has
>>> good data. I define the point at  which the data turns good to be the
>>> first index with a non-zero entry that is followed by at least 4
>>> consecutive non-zero data items (i.e. a week's worth of non-zero
>>> data). For example, if my list is [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8,
>>> 9], I would define the point at which data turns good to be 4 (1
>>> followed by 2, 3, 4, 5).
>>> I have a simple algorithm to identify this changepoint, but it looks
>>> crude: is there a cleaner, more elegant way to do this?
>>>     flag = True
>>>     i=-1
>>>     j=0
>>>     while flag and i < len(retHist)-1:
>>>         i += 1
>>>         if retHist[i] == 0:
>>>             j = 0
>>>         else:
>>>             j += 1
>>>             if j == 5:
>>>                 flag = False
>>>     del retHist[:i-4]
>>> Thanks in advance for your help
>>> Thomas Philips
>> data = [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>
>> def itergood(indata):
>>      indata = iter(indata)
>>      buf = []
>>      while len(buf) < 4:
>>          buf.append(indata.next())
>>          if buf[-1] == 0:
>>              buf[:] = []
>>      for x in buf:
>>          yield x
>>      for x in indata:
>>          yield x
>>
>> for d in itergood(data):
>>      print d
> 
> This seems the most efficient so far for arbitrary iterables. With a
> few micro-optimizations it becomes:
> 
> from itertools import chain
> 
> def itergood(indata, good_ones=4):
>     indata = iter(indata); get_next = indata.next
>     buf = []; append = buf.append
>     while len(buf) < good_ones:
>         next = get_next()
>         if next: append(next)
>         else: del buf[:]
>     return chain(buf, indata)
> 
> $ python -m timeit -s "x = 1000*[0, 0, 0, 1, 2, 3] + [1,2,3,4]; from
> itergood import itergood" "list(itergood(x))"
> 100 loops, best of 3: 3.09 msec per loop
> 
> And with Psyco enabled:
> $ python -m timeit -s "x = 1000*[0, 0, 0, 1, 2, 3] + [1,2,3,4]; from
> itergood import itergood" "list(itergood(x))"
> 1000 loops, best of 3: 466 usec per loop
> 
> George
> --

I always forget the 'del slice' method for clearing a list, thanks.

I think that returning a `chain` means that the function is not itself a 
generator. And so if the indata has length less than or equal
to the threshold (good_ones), an unhandled StopIteration is raised 
before the return statement is reached.


G.




More information about the Python-list mailing list