\w in regular expression
Marcello Pietrobon
teiffel at attglobal.net
Sat Feb 28 15:14:17 EST 2004
Hello,
I am reading
http://www.amk.ca/python/howto/regex/
But there is an incongruence:
In the paragraph 2.1: Matching Character:
|\w|
Matches any alphanumeric character; this is equivalent to the class
[a-zA-Z0-9_].
|\W|
Matches any non-alphanumeric character; this is equivalent to the
class |[^a-zA-Z0-9_]|.
Which is fine with me and the same as in Perl and congruent with:
|\d|
Matches any decimal digit; this is equivalent to the class [0-9].
|\D|
Matches any non-digit character; this is equivalent to the class
|[^0-9]|.
|
But in the paragraph 5.1: Splitting Strings
I find:
|
>>> p = re.compile(r'\W+')
>>> p.split('This is a test, short and sweet, of split().')
['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', '']
At first I thought a typo:
But on my Python command line:
Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> p = re.compile(r'\W+'); print p
<_sre.SRE_Pattern object at 0x0090DC38>
>>> p.split('This is a test, short and sweet, of split().')
['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', '']
>>> p = re.compile(r'\w+'); print p
<_sre.SRE_Pattern object at 0x0090D140>
>>> p.split('This is a test, short and sweet, of split().')
['', ' ', ' ', ' ', ', ', ' ', ' ', ', ', ' ', '().']
>>>
In other word is Python re module not compatible with Perl ?
I also noted that the tools\scripts\redemo.py behaves different than the Python command line ( it is not the only case )
because it matches 'This' when I use \w+ and not when I use \W+
???
Thank you for any comments,
Marcello
||
More information about the Python-list
mailing list