regexp speed concerns

Bengt Richter bokr at oz.net
Fri Jan 10 16:32:21 EST 2003


On 9 Jan 2003 16:11:44 -0800, quioxl at yahoo.com (Jonathan Craft) wrote:
[...]
>
>The code below is paraphrased...the actual code much bulkier, but the
>key lines are captured below.  The "First Python Attempt" may not be
>syntatically correct, as it was replaced by later attempts and I don't
>have it handy anymore.
>
>Original Perl:
>--------------
>for $line in (<fh>) {
>  if (!$errFound) {
>    if ($line =~ m|[\s^](ERR|err|FAIL)|) {
I'd have used different delimiters ;-)
     if ($line =~ m#[\s^](ERR|err|FAIL)#) {
>      $errFound = 1;
>    }
>  }
>}

Did you really mean [\s^] ? I.e., a single character being either white space or '^' ?
IOW, 'asdfasdfasdfasdf^ERRasdfsdf' should match? And likewise the others, anywhere in the line?

>
>First Python Attempt:
>---------------------
>import re
>errSO = re.compile('[\s^](ERR|err|FAIL)')
>for line in fh:
>  if not finishFound:
>    match = errSO.search(line)
>    if match != None: finishFound = 1
>
>Latest Python Attempt:
>----------------------
>import regex
I get
 >>> import regex
 __main__:1: DeprecationWarning: the regex module is deprecated; please use the re module

>errSO = regex.compile('\( \|^\)\(ERR\|err\|FAIL\)')
>for line in fh:
>  if not finishFound:
>    if doneSO.search(line) >= 0: finishFound = 1
>  if not errFound:
>    if errSO.search(line) >= 0: errFound = 1
I don't know what you are doing with the found flags,
but once both of them are set, why continue with the loop?

BTW, I didn't see the definition of doneSO.
>
>
I suspect (need test to verify) that if one of your patterns occurs say 99% of the time,
you could gain by looking for it with a plain string search method separately,
e.g. (untested!) if ^ERR were most frequent:

# untested
import re
for line in fh:
    if not errorFound:
        errorFound = line.find('^ERR') >= 0 or errSO.search(line)
    if not finishFound:
        finishFound = bool(doneSO.search(line))
    if errorFound and finishFound: break
else:
    print 'Nothing found'

The else gets executed if you do *NOT* break out of the loop. This can
be useful in distinguishing conditions of exit without using a flag, so
maybe you can eliminate one?

Hopefully the new line iterator reads ahead, otherwise the xreadlines module might help.

Regards,
Bengt Richter




More information about the Python-list mailing list