[Numpy-discussion] Fast Reading of ASCII files
Chris Barker
chris.barker at noaa.gov
Tue Dec 13 16:07:56 EST 2011
On Tue, Dec 13, 2011 at 11:29 AM, Bruce Southey <bsouthey at gmail.com> wrote:
> **
> Reading data is hard and writing code that suits the diversity in the
> Numerical Python community is even harder!
>
>
yup
Both loadtxt and genfromtxt functions (other functions are perhaps less
> important) perhaps need an upgrade to incorporate the new NA object.
>
yes, if we are satisfiedthat the new NA object is, in fact, the way of the
future.
> Here I think loadtxt is a better target than genfromtxt because, as I
> understand it, it assumes the user really knows the data. Whereas
> genfromtxt can ask the data for the appropriatye format.
>
> So I agree that new 'superfast custom CSV reader for well-behaved data'
> function would be rather useful especially as an replacement for loadtxt.
> By that I mean reading data using a user specified format that essentially
> follows the CSV format (
> http://en.wikipedia.org/wiki/Comma-separated_values) - it needs are to
> allow for NA object, skipping lines and user-defined delimiters.
>
>
I think that ideally, there could be one interface to reading tabular data
-- hopefully, it would be easy for the user to specify what the want, and
if they don't the code tries to figure it out. Also, under the hood, the
"easy" cases are special-cased to high-performing versions.
genfromtxt sure looks close for an API -- it just needs the "high
performance special cases" under the hood. It may be that the way it's
designed makes it very difficult to do that, though -- I haven't looked
closely enough to tell.
At least that's what I'm thinking at the moment.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111213/1225bdd8/attachment.html>
More information about the NumPy-Discussion
mailing list