[Numpy-discussion] seeking advice on a fast string->array conversion

Tue Nov 16 11:57:16 EST 2010

On Tue, Nov 16, 2010 at 11:46 AM, Christopher Barker
<Chris.Barker at noaa.gov> wrote:
> On 11/16/10 7:31 AM, Darren Dale wrote:
>> On Tue, Nov 16, 2010 at 9:55 AM, Pauli Virtanen<pav at iki.fi>  wrote:
>>> Tue, 16 Nov 2010 09:41:04 -0500, Darren Dale wrote:
>>> [clip]
>>>> That loop takes 0.33 seconds to execute, which is a good start. I need
>>>> some help converting this example to return an actual numpy array. Could
>>>> anyone please offer a suggestion?
>
> Darren,
>
> It's interesting that you found fromstring() so slow -- I've put some
> time into trying to get fromfile() and fromstring() to be a bit more
> robust and featurefull, but found it to be some really painful code to
> work on -- but it didn't dawn on my that it would be slow too! I saw all
> the layers of function calls, but I still thought that would be minimal
> compared to the actual string parsing. I guess not. Shows that you never
> know where your bottlenecks are without profiling.
>
> "Slow" is relative, of course, but since the whole point of
> fromfile/string is performance (otherwise, we'd just parse with python),
> it would be nice to get them as fast as possible.
>
> I had been thinking that the way to make a good fromfile was Cython, so
> you've inspired me to think about it some more. Would you be interested
> in extending what you're doing to a more general purpose tool?
>
> Anyway,  a comment or two:
>> cdef extern from 'stdlib.h':
>>      double atof(char*)
>
> One thing I found with the current numpy code is that the use of the
> ato* functions is a source of a lot of bugs (all of them?) the core
> problem is error handling -- you have to do a lot of pointer checking to
> see if a call was successful, and with the fromfile code, that error
> handling is not done in all the layers of calls.

In my case, I am making an assumption about the integrity of the file.

> Anyone know what the advantage of ato* is over scanf()/fscanf()?
>
> Also, why are you doing string parsing rather than parsing the files
> directly, wouldn't that be a bit faster?

Rank inexperience, I guess. I don't understand what you have in mind.
scanf/fscanf don't actually convert strings to numbers, do they?

> I've got some C extension code for simple parsing of text files into
> arrays of floats or doubles (using fscanf). I'd be curious how the
> performance compares to what you've got. Let me know if you're interested.

I'm curious, yes.

Darren