Speeding up a regular expression
Michael Lerner
mlerner at umich.deleteme.edu
Tue Oct 23 13:22:01 EDT 2001
Oops ..
So, I looked at my post and realized that part of it didn't make sense.
There has to be _something_ between the numbers, obviously. I looked
through the input files again, and it looks like there is either a minus
sign, or some number of spaces, or both. But not neither.
Anyway, I'd still appreciate some help speeding this up.
Thanks,
-michael
Michael Lerner <mlerner at umich.deleteme.edu> wrote:
> Hi,
> I'm a relative newbie to Python, and I'm certainly no regular expression
> wizard. I have a text file with a bunch of lines of the form
> 1-1.1 2.2 -3.3 4.4 5.5 -6.6
> That is, an integer, followed by six floats, with an arbitrary number of
> spaces in between the numbers. Note that that arbitrary number can be
> zero, as is the case between the 1 and -1.1 above.
> There are also a bunch of other lines in the file. I only want the ones
> that are like the line above.
> So, here's what I did:
> ---- begin my schlocky code ----
> import re
> def gimmeWhatIWant(inputString):
> myRe = re.compile(r"""
> ^ # start at the beginning of the line
> (\s*) # our leading spaces
> (\d+\s*) # the integer, which may or may not
> # have a trailing space!
> (-?\d+\.\d+\s*){6,6} # all six floats MAY have spaces
> # after them
> $ # end at the end of the line
> """, re.VERBOSE)
> lines = string.split(inputString,"\n")
> returnString = ""
> for line in lines:
> if myRe.match(line):
> returnString = returnString + line + "\n"
> return returnString
> ---- end my schlocky code ----
> The thing is, this is slow when I run it on input strings with 6 or 7
> thousand lines.
> Any hints on how I could speed it up?
> One thing: I think that replacing the string.split(...) call with
> inputString.split("\n") might speed things up a little. But, that's not
> where most of the time is spent and I'd like this to work with Python
> 1.5.2 if possible.
> thanks,
> -michael
More information about the Python-list
mailing list