ignore case only for a part of the regex?

Mon Dec 31 23:14:26 EST 2012

On Sun, 30 Dec 2012 10:20:19 -0500, Roy Smith wrote:

> The way I would typically do something like this is build my regexes in
> all lower case and .lower() the text I was matching against them.  I'm
> curious what you're doing where you want to enforce case sensitivity in
> one part of a header, but not in another.

Well, sometimes you have things that are case sensitive, and other things 
which are not, and sometimes you need to match them at the same time. I 
don't think this is any more unusual than (say) wanting to match an 
otherwise lowercase word whether or not it comes at the start of a 
sentence:

"[Pp]rogramming"

is conceptually equivalent to "match case-insensitive `p`, and case-
sensitive `rogramming`".

By the way, although there is probably nothing you can (easily) do about 
this prior to Python 3.3, converting to lowercase is not the right way to 
do case-insensitive matching. It happens to work correctly for ASCII, but 
it is not correct for all alphabetic characters.

py> 'Straße'.lower()
'straße'
py> 'Straße'.upper()
'STRASSE'

The right way is to casefold first, then match:

py> 'Straße'.casefold()
'strasse'

Curiously, there is an uppercase ß in old German. In recent years some 
typographers have started using it instead of SS, but it's still rare, 
and the official German rules have ß transform into SS and vice versa. 
It's in Unicode, but few fonts show it:

py> unicodedata.lookup('LATIN CAPITAL LETTER SHARP S')
'ẞ'

-- 
Steven