need fast parser for comma/space delimited numbers

Les Schaffer godzilla at netmeg.net
Sat Mar 18 12:09:58 EST 2000


>>>>> ">" == Gordon McMillan <gmcm at hypernet.com> writes:

    >> Les Schaffer wants speed: 

hmmm.... i am going to get a bad reputation ...

    >> We can speed up what you've got, but probably not that much!

your ideas look real good. i will try them first before thinking about
doing a C module. 

    >> First, use "def __parseIFF(self, str, atoi=string.atoi,
    >> atof=string.atof):" and then access those as locals.

this is interesting. is there a difference between

def __parseIFF(self, str, atoi=string.atoi):
  ...

and 

def __parseIFF(self, str, atoi=string.atoi):
   atoi = string.atoi
   ...

i am guessing there is enough difference, never thought about it till
now. i guess the atoi=string.atoi creates a "static" local copy for
this function, the assignment done only once, whereas the
atoi=string.atoi in the body of the def gets executed every stinkin
time, correct?

    >> Second, benchmark against "int" and "float".

okay. i noticed in the Scientific Python modules K. Hinsen uses
something like this

numb = exec( str )

with str being things like ' 4.235 ', etc. i wonder which is faster?
(thinking out loud)

    >> First, splitfields is obsolete, use "split". 

sheeesh. i read the manual all the time and i constantly confuse which
of them is obsolete and which isnt. Someone toss that splitfields out
the window, please!!!!

    >> Second, special case the whitespace case, because that would
    >> just be "split(str)".  Third, use locals trick.

i think i can swing the special case trick, cause code using this class
can know ahead of time if its csv or whitespace.
 
> For the all floats, all whitespace case, this would just be
>  num = map(float, split(strLines[i])) 
> and that might get you the speed you want.

okay. is there a big difference between the string.ato[if] and
float/int?

> For the comma case, you might try:
>   s = join(split(strLines[i], ','), ' ')
>   num = map(float, split(s))
> or
>  t = split(strLines[i], ',')
>  t = map(strip, t)
>  num = map(float, t)

will give'em a try...

anyone care to take a guesstimate on how much further time i could
save by coding something in C?if i did that, i would write a function
which takes a Python list object (list of string) and passes back a
pair of Numeric array objects (dependent and independent
variables). so i would cut out all the python for looping as well.

many thanks, gordon!

les schaffer



More information about the Python-list mailing list