regexp speed concerns
Bengt Richter
bokr at oz.net
Fri Jan 10 16:32:21 EST 2003
On 9 Jan 2003 16:11:44 -0800, quioxl at yahoo.com (Jonathan Craft) wrote:
[...]
>
>The code below is paraphrased...the actual code much bulkier, but the
>key lines are captured below. The "First Python Attempt" may not be
>syntatically correct, as it was replaced by later attempts and I don't
>have it handy anymore.
>
>Original Perl:
>--------------
>for $line in (<fh>) {
> if (!$errFound) {
> if ($line =~ m|[\s^](ERR|err|FAIL)|) {
I'd have used different delimiters ;-)
if ($line =~ m#[\s^](ERR|err|FAIL)#) {
> $errFound = 1;
> }
> }
>}
Did you really mean [\s^] ? I.e., a single character being either white space or '^' ?
IOW, 'asdfasdfasdfasdf^ERRasdfsdf' should match? And likewise the others, anywhere in the line?
>
>First Python Attempt:
>---------------------
>import re
>errSO = re.compile('[\s^](ERR|err|FAIL)')
>for line in fh:
> if not finishFound:
> match = errSO.search(line)
> if match != None: finishFound = 1
>
>Latest Python Attempt:
>----------------------
>import regex
I get
>>> import regex
__main__:1: DeprecationWarning: the regex module is deprecated; please use the re module
>errSO = regex.compile('\( \|^\)\(ERR\|err\|FAIL\)')
>for line in fh:
> if not finishFound:
> if doneSO.search(line) >= 0: finishFound = 1
> if not errFound:
> if errSO.search(line) >= 0: errFound = 1
I don't know what you are doing with the found flags,
but once both of them are set, why continue with the loop?
BTW, I didn't see the definition of doneSO.
>
>
I suspect (need test to verify) that if one of your patterns occurs say 99% of the time,
you could gain by looking for it with a plain string search method separately,
e.g. (untested!) if ^ERR were most frequent:
# untested
import re
for line in fh:
if not errorFound:
errorFound = line.find('^ERR') >= 0 or errSO.search(line)
if not finishFound:
finishFound = bool(doneSO.search(line))
if errorFound and finishFound: break
else:
print 'Nothing found'
The else gets executed if you do *NOT* break out of the loop. This can
be useful in distinguishing conditions of exit without using a flag, so
maybe you can eliminate one?
Hopefully the new line iterator reads ahead, otherwise the xreadlines module might help.
Regards,
Bengt Richter
More information about the Python-list
mailing list