[Numpy-discussion] Fast Reading of ASCII files

Wes McKinney wesmckinn at gmail.com
Tue Dec 13 16:57:51 EST 2011


On Mon, Dec 12, 2011 at 12:34 PM, Warren Weckesser
<warren.weckesser at enthought.com> wrote:
>
>
> On Mon, Dec 12, 2011 at 10:22 AM, Chris.Barker <chris.barker at noaa.gov>
> wrote:
>>
>> On 12/11/11 8:40 AM, Ralf Gommers wrote:
>> > On Wed, Dec 7, 2011 at 7:50 PM, Chris.Barker <Chris.Barker at noaa.gov
>> >     * If we have a good, fast ascii (or unicode?) to array reader,
>> > hopefully
>> >     it could be leveraged for use in the more complex cases. So that
>> > rather
>> >     than genfromtxt() being written from scratch, it would be a wrapper
>> >     around the lower-level reader.
>> >
>> > You seem to be contradicting yourself here. The more complex cases are
>> > Wes' 10% and why genfromtxt is so hairy internally. There's always a
>> > trade-off between speed and handling complex corner cases. You want
>> > both.
>>
>> I don't think the version in my mind is contradictory (Not quite).
>>
>> What I'm imagining is that a good, fast ascii to numpy array reader
>> could read a whole table in at once (the common, easy, fast, case), but
>> it could also be used to read snippets of a file in at a time, which
>> could be leveraged to handle many of the more complex cases.
>>
>> I suppose there will always be cases where the user needs to write their
>> own converter from string to dtype, and there is simply no way to
>> leverage what I'm imagining to supported that.
>>
>> Hmm, maybe there is -- for instance, if a "record" consisted off mostly
>> standard, easy-to-parse, numbers, but one field was some weird text that
>> needed custom parsing, we could read it as a dtype, with a string for
>> that one weird field, and that could be converted in a post-processing
>> step.
>>
>> Maybe that wouldn't be any faster or easier, but it could be done...
>>
>> Anyway, whether you can leverage it for the full-featured version or
>> not, I do think there is call for a good, fast, 90% case text file parser.
>>
>>
>> Would anyone like to join/form a small working group to work on this?
>>
>> Wes, I'd like to see your Cython version -- maybe a starting point?
>>
>> -Chris
>
>
>
> I'm also working on a faster text file reader, so count me in.  I've been
> experimenting in both C and Cython.   I'll put it on github as soon as I
> can.
>
> Warren
>
>
>>
>>
>>
>> --
>> Christopher Barker, Ph.D.
>> Oceanographer
>>
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115       (206) 526-6317   main reception
>>
>> Chris.Barker at noaa.gov
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

Cool, Warren, I look forward to seeing it. I'm hopeful we can craft a
performant tool that will meet the needs of of many projects (NumPy,
pandas, etc.)...



More information about the NumPy-Discussion mailing list