Regular expression fun. Repeated matching of a group Q

johnzenger at gmail.com johnzenger at gmail.com
Fri Feb 24 11:37:50 EST 2006


You can check len(sanesplit) to see how big your list is.  If it is <
2, then there were no  <td>'s, so move on to the next line.

It is probably possible to do the whole thing with a regular
expression.  It is probably not wise to do so.  Regular expressions are
difficult to read, and, as you discovered, difficult to program and
debug.  In many cases, Python code that relies on regular expressions
for lots of program logic runs slower than code that uses normal
Python.

Suppose "words" contains all the words in English.  Compare these two
lines:

foobarwords1 = [x for x in words if re.search("foo|bar", x) ]
foobarwords2 = [x for x in words if "foo" in x or "bar" in x ]

I haven't tested this with 2.4, but as of a few years ago it was a safe
bet that foobarwords2 will be calculated much, much faster.  Also, I
think you will agree, foobarwords2 is a lot easier to read.




More information about the Python-list mailing list