need fast parser for comma/space delimited numbers

Michael A. Miller mmiller3 at iupui.edu
Sun Mar 19 11:59:41 EST 2000


>>>>> "Les" == Les Schaffer <godzilla at netmeg.net> writes:

    > I have written an application for reading in large amounts
    > of space/comma delimited numbers from ASCII text files for
    > statistical processing.

    > I originally used re expresssions for splitting, but i was
    > able to cut the time required for data file parsing down to
    > a third by using string.split on the comma or space.

    > Still, the app takes about 5 minutes to parse a typical set
    > of data files. I'd like to drop that down to a minute of
    > possible.

    > Which means i probably need to wrap in a C module with
    > something like an sscanf. Or maybe just a function which
    > find the delimiters and delivers the number parts of string
    > (defined by delimiters) to atoi and atof functions.

    > But before i get started, i imagine someone else has
    > already done this.

    > anyone have pointers to said code or suggestions? i'll
    > happily post my code if there is none out there already.

TableIO [1] does exactly what you're looking for and is
reasonably fast I think.  It is a C extension for reading data
from ascii files.  Rather than using scanf, it uses fgets and
strtok to parse lines.  It also allows you to flag certain lines
as "comment" lines by skipping any lies containing a specified
character.

Mike

[1] http://php.iupui.edu/~mmiller3/python/

-- 
Michael A. Miller                      mmiller3 at iupui.edu
  Krannert Institute of Cardiology, IU School of Medicine



More information about the Python-list mailing list