\w in regular expression

Marcello Pietrobon teiffel at attglobal.net
Sat Feb 28 15:14:17 EST 2004


Hello,

I am reading
http://www.amk.ca/python/howto/regex/

But there is an incongruence:
In the paragraph 2.1:  Matching Character:

|\w|
    Matches any alphanumeric character; this is equivalent to the class
    [a-zA-Z0-9_].

|\W|
    Matches any non-alphanumeric character; this is equivalent to the
    class |[^a-zA-Z0-9_]|.

Which is fine with me and the same as in Perl and congruent with:

|\d|
    Matches any decimal digit; this is equivalent to the class [0-9].

|\D|
    Matches any non-digit character; this is equivalent to the class
    |[^0-9]|.

|
But in the paragraph 5.1: Splitting Strings
I find:
|

>>> p = re.compile(r'\W+')
>>> p.split('This is a test, short and sweet, of split().')
['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', '']


At first I thought a typo:
But on my Python command line:
Python 2.3 (#46, Jul 29 2003, 18:54:32) [MSC v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> p = re.compile(r'\W+'); print p
<_sre.SRE_Pattern object at 0x0090DC38>
>>> p.split('This is a test, short and sweet, of split().')
['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', '']
>>> p = re.compile(r'\w+'); print p
<_sre.SRE_Pattern object at 0x0090D140>
>>> p.split('This is a test, short and sweet, of split().')
['', ' ', ' ', ' ', ', ', ' ', ' ', ', ', ' ', '().']
>>>



In other word is Python re module not compatible with Perl ?


I also noted that the tools\scripts\redemo.py behaves different than the Python command line ( it is not the only case )
because it matches 'This' when I use \w+  and not when I use \W+


???


Thank you for any comments,

Marcello






||









More information about the Python-list mailing list