[Tutor] Replacing a value in a list

nzbz xx nzbzxx at gmail.com
Mon Aug 16 13:15:19 EDT 2021


I have the codes as such now but it doesn't yield the same number of input
and output e.g. given the raw data is [-999,-999, 3, 4, -999 ], the output
generated is [3, 3, 4]. I'm not sure where the problem lies when I've
already done a for-loop and it should give me the same number of elements
in the list.

def clean_data(dataset):

    data_cleaned_count = 0
    clean_list = []

    # Identifying all valid data in the dataset
    for data in range(len(dataset)):
        if dataset[data] != -999:
            clean_list.append(dataset[data])

   # Replacing missing data
    for md in range(len(dataset)):
        if dataset[md] == -999:

            consecutive_invalid_count = 0
            for i in range(len(dataset)-1):
                if dataset[i] == dataset[i+1]:
                    consecutive_invalid_count +=1

            start = dataset.index(-999)
            end = start + consecutive_invalid_count

            left_idx = start-1     # Finding the adjacent valid data
            right_idx = end + 1

            if abs(md - left_idx) > abs(md - right_idx) or md >=
len(dataset)-1:    # Locating the nearest valid data
                clean_list.insert(md,dataset[left_idx] )
                data_cleaned_count += 1

            if abs(md - left_idx) < abs(md - right_idx) or md == 0:
                clean_list.insert(md, dataset[right_idx])
                data_cleaned_count += 1

On Sat, Aug 14, 2021 at 8:10 PM Alan Gauld via Tutor <tutor at python.org>
wrote:

> On 14/08/2021 05:23, nzbz xx wrote:
> > Assuming that when there are consecutive missing values, they should be
> > replaced with adjacent valid values e.g [1,2,-999,-999,5] should give
> > [1,2,2,5,5]. And given [1,-999,-999,-999,5], the middle missing value
> would
> > take the average of index 1 & 3. This should get an output of [1, 2 , 2,
> > 3.5, 5]. How should it be coded for it to solve from the outer elements
> > first to the inner elements?
>
> That still leaves the question of what happens when there are 4 or 5
> blank pieces? For 4 you can duplicate the average, but what about 5?
> Is there a point at which ou decide the data is too damaged to continue
> and throw an error?
>
> As for coding it I'd write a small helper function to find the
> start/stop indices of the blanks (or the start and length if you prefer)
>
> Something like:
>
> def findBlanks(seq,blank=-999):
>     start = seq.index(blank)
>     end = start+1
>     while seq[end] == blank:
>         end+=1
>     return start,end
>
>  For the simple case of 3 blanks you can do
>
> if start != 0:
>    seq[start] = seq[start-1]
> if end != len(seq)-1:
>    end = seq[end+1]
>
> gapsize = end-start
> if gapsize >= 3:
>    seq[start+1]=(seq[start]+seq[end])/2
>
> Now what you do if end-start>3 is up to you.
> And what you do if the gap is at the start or
> end of seq is also up to you...
>
> It's all in the specification.
> What is supposed to happen? Once you know the
> complete algorithm the code should practically
> write itself.
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list