Newbie regular expression and whitespace question

Fredrik Lundh fredrik at pythonware.com
Thu Sep 22 16:42:46 EDT 2005


Paul McGuire wrote:

> If you're absolutely stuck on using RE's, then others will have to step
> forward.  Meanwhile, here's a pyparsing solution (get pyparsing at
> http://pyparsing.sourceforge.net):

so, let's see.  using ...

from pyparsing import *
import re

data = """ ... table example from op ... """

def test1():
    LT = Literal("<")
    GT = Literal(">")
    collapsableSpace = GT + LT
    collapsableSpace.setParseAction( replaceWith("><") )
    return collapsableSpace.transformString(data)

def test2():
    return re.sub(">\s+<", "><", data)

I get

> timeit -s "import test" "test.test1()"
100 loops, best of 3: 6.8 msec per loop

> timeit -s "import test" "test.test2()"
10000 loops, best of 3: 33.3 usec per loop

or in other words, five lines instead of one, and a 200x slowdown.

but alright, maybe we should precompile the expressions to get a
fair comparision.  adding

LT = Literal("<")
GT = Literal(">")
collapsableSpace = GT + LT
collapsableSpace.setParseAction( replaceWith("><") )

def test3():
    return collapsableSpace.transformString(data)

p = re.compile(">\s+<")

def test4():
    return p.sub("><", data)

to the first program, I get

> timeit -s "import test" "test.test3()"
100 loops, best of 3: 6.73 msec per loop

> timeit -s "import test" "test.test4()"
10000 loops, best of 3: 27.8 usec per loop

that's a 240x slowdown.  hmm.

</F> 






More information about the Python-list mailing list