[Numpy-discussion] seeking advice on a fast string->array conversion
Christopher Barker
Chris.Barker at noaa.gov
Tue Nov 16 13:01:11 EST 2010
On 11/16/10 8:57 AM, Darren Dale wrote:
> In my case, I am making an assumption about the integrity of the file.
That does make things easier, but less universal. I guess this is the
whole trade-off about "reusable code". It sure it a lot easier to write
code that does the one thing you need than something general purpose.
>> Anyone know what the advantage of ato* is over scanf()/fscanf()?
>>
>> Also, why are you doing string parsing rather than parsing the files
>> directly, wouldn't that be a bit faster?
>
> Rank inexperience, I guess. I don't understand what you have in mind.
if your goal is to read numbers from an ascii file, you can use
fromfile() directly, rather than reading the file (or some of it) into a
string, and then using fromstring(). Also, in C, you can use fscanf to
read the file directly (of course, under the hood, it's putting stuff in
stings somewhere along the line, but presumably in an optimized way.
> scanf/fscanf don't actually convert strings to numbers, do they?
yes, that's exactly what they do.
http://en.wikipedia.org/wiki/Scanf
The C lib may very well use ato* under the hood.
My idea at this point is to write a function in Cython to takes a file
and a numpy dtype, converts the dtype to a scanf format string, then
calls fscanf (or scanf) to parse out the file. My existing scanner code
more or less does that, but the format string is hard-code to be either
for floats or doubles.
>> I've got some C extension code for simple parsing of text files into
>> arrays of floats or doubles (using fscanf). I'd be curious how the
>> performance compares to what you've got. Let me know if you're interested.
>
> I'm curious, yes.
OK -- I'll whip up a test similar to yours -- stay tuned!
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the NumPy-Discussion
mailing list