Correct handling of case in unicode and regexps

MRAB python at mrabarnett.plus.com
Sat Feb 23 12:41:11 EST 2013


On 2013-02-23 15:30, Devin Jeanpierre wrote:
> On Sat, Feb 23, 2013 at 10:26 AM, Devin Jeanpierre
> <jeanpierreda at gmail.com> wrote:
>> However, regex has the same behavior.
>
> My apologies, I forgot to set the VERSION1 flag.
>
> Interesting. 'ss' matches 'ß', but 's+' does not.
>
> Is this desirable behavior?
>
Getting full case folding to work can be tricky. There's always going to
be a limit to what's worth doing.

There are also areas where it's not clear what the result should be.
You've already mentioned matching 's' against 'ß' (fails) and matching
'ss' against 'ß' (succeeds), but how about matching '(s)(s)' against 'ß'
(fails)?

For the record, Perl also says that 'ss' matches 'ß', but 's+' does not.



More information about the Python-list mailing list