Why is it different about '\s' Matches whitespace and Equivalent to [\t\n\r\f]?

Ned Batchelder ned at nedbatchelder.com
Thu Jul 10 10:04:40 EDT 2014


On 7/10/14 9:32 AM, fl wrote:
> On Thursday, July 10, 2014 7:18:01 AM UTC-4, MRAB wrote:
>> On 2014-07-10 11:05, rx at gmail.com wrote:
>>
>> It's equivalent to [ \t\n\r\f], i.e. it also includes a space, so
>>
>> either the tutorial is wrong, or you didn't look closely enough. :-)
>>
>>
>> The string starts with ' ', not '\t'.
>>
>>
>>
>>
>>
>> The string starts with ' ', which isn't in the character set.
>>
>>
> The '\s' description is on link:
>
> http://www.tutorialspoint.com/python/python_reg_expressions.htm
>

For some reason, that page shows much of its information twice.  The 
first occurrence of \s there is:

     \s    Matches whitespace. Equivalent to [\t\n\r\f].

The second is:

     \s    Match a whitespace character: [ \t\r\n\f]

The second one is correct.  The first is wrong.  You might want to send 
the author a bug report.

Actually, neither is strictly correct, since as the official docs 
(https://docs.python.org/2/library/re.html) say,

     \s    When the UNICODE flag is not specified, it matches any
     whitespace character, this is equivalent to the set [ \t\n\r\f\v].
     The LOCALE flag has no extra effect on matching of the space. If
     UNICODE is set, this will match the characters [ \t\n\r\f\v] plus
     whatever is classified as space in the Unicode character properties
     database.


>
> Could you give me an example to use the equivalent pattern?
>
> Thanks
>


-- 
Ned Batchelder, http://nedbatchelder.com




More information about the Python-list mailing list