[Numpy-discussion] loadtxt and missing values

Keith Goodman kwgoodman at gmail.com
Thu Mar 6 16:22:02 EST 2008


On Thu, Mar 6, 2008 at 1:12 PM, Sean Arms <caver_sean at ou.edu> wrote:
>
> Keith Goodman wrote:
>  > On Thu, Mar 6, 2008 at 12:36 PM,  <caver_sean at ou.edu> wrote:
>  >
>  >
>  >>  Proposed solution:
>  >>  -------------------------
>  >>
>  >>  It's probably not the best way (noob, that's me), but this situation could be fixed by:
>  >>
>  >>  1) add a fill keyword to loadtxt such that
>  >>
>  >>  def loadtxt(...,fill=-999):
>  >>
>  >>  2) add the following after the line "vals = line.split(delimiter)" (line 713 in core/numeric.py , numpy 1.0.4)
>  >>
>  >>  ======================
>  >>        for j in range(0,len(vals)):
>  >>            if vals[j] != '':
>  >>                pass
>  >>            else:
>  >>                vals[j]=fill
>  >>  ======================
>  >>
>  >>
>  >>  Testing: -------------------------
>  >>
>  >>  Load an 18,000 line ascii dataset, 22 float variables on each line, skipping the first column (its a time stamp).
>  >>
>  >>  Timings using %timeit in ipython:
>  >>
>  >>  Reading an ascii file with no missing values using the current version of loadtxt:
>  >>  ***10 loops, best of 3: 704 ms per loop
>  >>
>  >>  Reading an ascii file with no missing values using the proposed changes to loadtxt:
>  >>  ***10 loops, best of 3: 802 ms per loop
>  >>
>  >>  The changes do create a slight performance hit for those who use loadtxt to read in nicely behaving ascii data.  If this is an issue, could a loadtxt2 function be added?
>  >>
>  >
>  > I haven't used loadtxt so I don't have an opinion on changing it. But
>  > would this be faster instead of a for loop?
>  >
>  > vals = [(z, fill)[z is ''] for z in vals]
>
> > _______________________________________________
>  > Numpy-discussion mailing list
>  > Numpy-discussion at scipy.org
>  > http://projects.scipy.org/mailman/listinfo/numpy-discussion
>  >
>  >
>  Your suggestion appears to be about 2 ms faster (but still ~100 ms
>  slower than the unaltered loadtxt).

I guess that's not enough to stop global warming.



More information about the NumPy-Discussion mailing list