regular expressions and internationalization (WAS: permuting letters...)

Andrew Dalke adalke at mindspring.com
Wed Nov 17 17:35:17 EST 2004


Steven Bethard <steven.bethard at gmail.com> writes on Fri, 12 Nov 2004 
20:15:28 +0000 (UTC):
>Is there any way to match \w but not \d?

Dieter Maurer wrote:
> It is:  r'(?!\d)\w'

While implementation are free to optimize this case, the current
Python implementation is slower than the other solution of r"[^\d\W]"

 >>> text = "Blah an123d blah901234 9spam and eggs\n" * 1000
 >>> import re
 >>> pat1 = re.compile(r"((?!\d)\w)+")
 >>> pat2 = re.compile(r"[^\d\W]+")
 >>> len(pat2.findall(text))
7000
 >>> len(pat1.findall(text))
7000
 >>> import timeit
 >>> x = timeit.Timer(setup = "import __main__ as M",
                      stmt = "M.pat1.findall(M.text)")
 >>> x.timeit(100)
4.0506279468536377
 >>> x = timeit.Timer(setup = "import __main__ as M",
                      stmt = "M.pat2.findall(M.text)")
 >>> x.timeit(100)
1.8287069797515869
 >>>

				Andrew
				dalke at dalkescientific.com



More information about the Python-list mailing list