[Numpy-discussion] genfromtxt universal newline support

Jeff Reback jeffreback at gmail.com
Mon Jun 30 17:10:56 EDT 2014


In pandas 0.14.0, generic whitespace IS parsed via the c-parser, e.g. specifying '\s+' as a separator. Not sure when you were playing last with pandas, but the c-parser has been in place since late 2012. (version 0.8.0)

http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#text-parsing-api-changes
> On Jun 30, 2014, at 4:58 PM, Derek Homeier <derek at astro.physik.uni-goettingen.de> wrote:
> 
> On 30 Jun 2014, at 04:56 pm, Nathaniel Smith <njs at pobox.com> wrote:
> 
>>> A real need, which had also been discussed at length, is a truly performant text IO
>>> function (i.e. one using a compiled ASCII number parser, and optimally also a more
>>> memory-efficient one), but unfortunately all people interested in implementing this
>>> seem to have drifted away (not excluding myself from this)…
>> 
>> It's possible we could steal some code from Pandas for this. IIRC they
>> have C/Cython text parsing routines. (It's also an interesting
>> question whether they've fixed the unicode/binary issues, might be
>> worth checking before rewriting from scratch...)
> 
> Good point, last time I was playing with Pandas it was not any faster, but now a 10x
> speedup speaks for itself. Their C engine does not support generic whitespace separators,
> but that could probably be addressed in a numpy implementation.
> 
>                    Derek
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list