[Numpy-discussion] np.loadtxt : yet a new implementation...

Mon Dec 1 17:55:43 EST 2008

I agree, genloadtxt is a bit blotted, and it's not a surprise it's  
slower than the initial one. I think that in order to be fair,  
comparisons must be performed with matplotlib.mlab.csv2rec, that  
implements as well the autodetection of the dtype. I'm quite in favor  
of keeping a lite version around.

On Dec 1, 2008, at 4:47 PM, Stéfan van der Walt wrote:
>>
> I haven't investigated the code in too much detail, but wouldn't it be
> possible to implement the current set of functionality in a
> base-class, which is then specialised to add the rest?  That way, one
> could always instantiate TextReader yourself for some added speed.

Well, one of the issues is that we need to keep the function  
compatible w/ urllib.urlretrieve (Ryan, am I right?), which means not  
being able to go back to the beginning of a file (no call to .seek).  
Another issue comes from the possibility to define the dtype  
automatically: you need to keep track of the converters, then have to  
do a second loop on the data. Those converters are likely the  
bottleneck, as you need to check whether each value can be interpreted  
as missing or not and respond appropriately.

I thought about creating a base class, with a specific subclass taking  
care of the missing values. I found out it would have duplicated a lot  
of code

In any case, I think that's secondary: we can always optimize pieces  
of the code afterwards. I'd like more feedback on corner cases and  
usage...