regexp speed concerns
John Machin
sjmachin at lexicon.net
Fri Jan 10 17:44:09 EST 2003
Andrew Dalke <adalke at mindspring.com> wrote in message news:<avlhs0$cg5$1 at slb9.atl.mindspring.net>...
> You can also use a non-regex, as in
>
> import sys
>
> def main(fh):
> errSO = re.compile('[\s^](ERR|err|FAIL)')
> finishFound = 0
> looking_for = ("ERR", "err", "FAIL")
> for line in fh:
> words = line.split()
> if words and words[0] in looking_for:
> finishFound = 1
> break
> ...
Note that this code implements a restatement of the OP's
"requirements" -- his regexp matches words such as error, errata,
FAILED, FAILURE, and also things like ERR987 and FAIL-1234.
That said, the code can possibly be sped up by (a) using a dictionary
(b) stopping the split after the first word, viz:
looking_for = {"ERR":1, "err":1, "FAIL":1}
for line in fh:
words = line.split(None, 1)
if words and words[0] in looking_for:
As an aside, beware the subtle undocumented special treatment of the
default delimiter case:
>>> "---err---".split("-")
['', '', '', 'err', '', '', '']
>>> " err ".split(" ")
['', '', '', 'err', '', '', '']
>>> " err ".split(" ")
['', 'err', '']
>>> " err ".split()
['err']
Consistency with the non-default cases would produce ['', 'err', ''].
More information about the Python-list
mailing list