Speeding up a regular expression

Tue Oct 23 13:22:01 EDT 2001

Oops ..

So, I looked at my post and realized that part of it didn't make sense.
There has to be _something_ between the numbers, obviously.  I looked
through the input files again, and it looks like there is either a minus
sign, or some number of spaces, or both.  But not neither.

Anyway, I'd still appreciate some help speeding this up.

Thanks,

-michael

Michael Lerner <mlerner at umich.deleteme.edu> wrote:
> Hi,

> I'm a relative newbie to Python, and I'm certainly no regular expression
> wizard.  I have a text file with a bunch of lines of the form

>  1-1.1 2.2 -3.3  4.4     5.5 -6.6

> That is, an integer, followed by six floats, with an arbitrary number of
> spaces in between the numbers.  Note that that arbitrary number can be
> zero, as is the case between the 1 and -1.1 above.

> There are also a bunch of other lines in the file.  I only want the ones
> that are like the line above.

> So, here's what I did:

> ---- begin my schlocky code ----

> import re

> def gimmeWhatIWant(inputString):
>     myRe = re.compile(r"""
>         ^                    # start at the beginning of the line
>         (\s*)                # our leading spaces
>         (\d+\s*)             # the integer, which may or may not
>                              # have a trailing space!
>         (-?\d+\.\d+\s*){6,6} # all six floats MAY have spaces
>                              # after them
>         $                    # end at the end of the line
>         """, re.VERBOSE)

>     lines = string.split(inputString,"\n")
>     returnString = ""
>     for line in lines:
>         if myRe.match(line):
>             returnString = returnString + line + "\n"

>     return returnString

> ---- end my schlocky code ----

> The thing is, this is slow when I run it on input strings with 6 or 7
> thousand lines.

> Any hints on how I could speed it up?

> One thing:  I think that replacing the string.split(...) call with
> inputString.split("\n") might speed things up a little. But, that's not
> where most of the time is spent and I'd like this to work with Python
> 1.5.2 if possible.

> thanks,

> -michael