[Numpy-discussion] Possible roadmap addendum: building better text file readers

Wes McKinney wesmckinn at gmail.com
Thu Feb 23 15:24:28 EST 2012


On Thu, Feb 23, 2012 at 3:19 PM, Warren Weckesser
<warren.weckesser at enthought.com> wrote:
>
> On Thu, Feb 23, 2012 at 2:08 PM, Travis Oliphant <travis at continuum.io>
> wrote:
>>
>> This is actually on my short-list as well --- it just didn't make it to
>> the list.
>>
>> In fact, we have someone starting work on it this week.  It is his first
>> project so it will take him a little time to get up to speed on it, but he
>> will contact Wes and work with him and report progress to this list.
>>
>> Integration with np.loadtxt is a high-priority.  I think loadtxt is now
>> the 3rd or 4th "text-reading" interface I've seen in NumPy.  I have no
>> interest in making a new one if we can avoid it.   But, we do need to make
>> it faster with less memory overhead for simple cases like Wes describes.
>>
>> -Travis
>>
>
>
> I have a "proof of concept" CSV reader written in C (with a Cython
> wrapper).  I'll put it on github this weekend.
>
> Warren

Sweet, between this, Continuum folks, and me and my guys I think we
can come up with something good and suits all our needs. We should set
up some realistic performance test cases that we can monitor via
vbench (wesm/vbench) while we're work on the project.

- W

>
>>
>>
>> On Feb 23, 2012, at 1:53 PM, Pauli Virtanen wrote:
>>
>> > Hi,
>> >
>> > 23.02.2012 20:32, Wes McKinney kirjoitti:
>> > [clip]
>> >> To be clear: I'm going to do this eventually whether or not it
>> >> happens in NumPy because it's an existing problem for heavy
>> >> pandas users. I see no reason why the code can't emit structured
>> >> arrays, too, so we might as well have a common library component
>> >> that I can use in pandas and specialize to the DataFrame internal
>> >> structure.
>> >
>> > If you do this, one useful aim could be to design the code such that it
>> > can be used in loadtxt, at least as a fast path for common cases. I'd
>> > really like to avoid increasing the number of APIs for text file
>> > loading.
>> >
>> > --
>> > Pauli Virtanen
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list