[Numpy-discussion] loadtxt and missing values
Sean Arms
caver_sean at ou.edu
Thu Mar 6 16:12:23 EST 2008
Keith Goodman wrote:
> On Thu, Mar 6, 2008 at 12:36 PM, <caver_sean at ou.edu> wrote:
>
>
>> Proposed solution:
>> -------------------------
>>
>> It's probably not the best way (noob, that's me), but this situation could be fixed by:
>>
>> 1) add a fill keyword to loadtxt such that
>>
>> def loadtxt(...,fill=-999):
>>
>> 2) add the following after the line "vals = line.split(delimiter)" (line 713 in core/numeric.py , numpy 1.0.4)
>>
>> ======================
>> for j in range(0,len(vals)):
>> if vals[j] != '':
>> pass
>> else:
>> vals[j]=fill
>> ======================
>>
>>
>> Testing: -------------------------
>>
>> Load an 18,000 line ascii dataset, 22 float variables on each line, skipping the first column (its a time stamp).
>>
>> Timings using %timeit in ipython:
>>
>> Reading an ascii file with no missing values using the current version of loadtxt:
>> ***10 loops, best of 3: 704 ms per loop
>>
>> Reading an ascii file with no missing values using the proposed changes to loadtxt:
>> ***10 loops, best of 3: 802 ms per loop
>>
>> The changes do create a slight performance hit for those who use loadtxt to read in nicely behaving ascii data. If this is an issue, could a loadtxt2 function be added?
>>
>
> I haven't used loadtxt so I don't have an opinion on changing it. But
> would this be faster instead of a for loop?
>
> vals = [(z, fill)[z is ''] for z in vals]
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
Your suggestion appears to be about 2 ms faster (but still ~100 ms
slower than the unaltered loadtxt).
More information about the NumPy-Discussion
mailing list