Identifying the start of good data in a list

George Sakkis george.sakkis at gmail.com
Wed Aug 27 16:42:36 EDT 2008


On Aug 26, 10:39 pm, tkp... at hotmail.com wrote:
> On Aug 26, 7:23 pm, Emile van Sebille <em... at fenx.com> wrote:
>
>
>
> > tkp... at hotmail.com wrote:
> > > I have a list that starts with zeros, has sporadic data, and then has
> > > good data. I define the point at  which the data turns good to be the
> > > first index with a non-zero entry that is followed by at least 4
> > > consecutive non-zero data items (i.e. a week's worth of non-zero
> > > data). For example, if my list is [0, 0, 1, 0, 1, 2, 3, 4, 5, 6, 7, 8,
> > > 9], I would define the point at which data turns good to be 4 (1
> > > followed by 2, 3, 4, 5).
>
> > > I have a simple algorithm to identify this changepoint, but it looks
> > > crude: is there a cleaner, more elegant way to do this?
>
> >  >>> for ii,dummy in enumerate(retHist):
> > ...     if 0 not in retHist[ii:ii+5]:
> > ...         break
>
> >  >>> del retHist[:ii]
>
> > Well, to the extent short and sweet is elegant...
>
> > Emile
>
> This is just what the doctor ordered. Thank you, everyone, for the
> help.

Note that the version above (as well as most others posted) fail for
boundary cases; check out bearophile's doctest to see some of them.
Below are two more versions that pass all the doctests: the first
works only for lists and modifies them in place and the second works
for arbitrary iterables:

def clean_inplace(seq, good_ones=4):
    start = 0
    n = len(seq)
    while start < n:
        try: end = seq.index(0, start)
        except ValueError: end = n
        if end-start >= good_ones:
            break
        start = end+1
    del seq[:start]

def clean_iter(iterable, good_ones=4):
    from itertools import chain, islice, takewhile, dropwhile
    iterator = iter(iterable)
    is_zero = float(0).__eq__
    while True:
        # consume all zeros up to the next non-zero
        iterator = dropwhile(is_zero, iterator)
        # take up to `good_ones` non-zeros
        good = list(islice(takewhile(bool,iterator), good_ones))
        if not good: # iterator exhausted
            return iterator
        if len(good) == good_ones:
            # found `good_ones` consecutive non-zeros;
            # chain them to the rest items and return them
            return chain(good, iterator)

HTH,
George



More information about the Python-list mailing list