Python Unicode handling wins again -- mostly

Roy Smith roy at panix.com
Fri Nov 29 21:08:49 EST 2013


In article <529934dc$0$29993$c3e8da3$5496439d at news.astraweb.com>,
 Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:

> (8) What's the uppercase of "baffle" spelled with an ffl ligature?
> 
> Like most other languages, Python 3.2 fails:
> 
> py> 'baffle'.upper()
> 'BAfflE'
> 
> but Python 3.3 passes:
> 
> py> 'baffle'.upper()
> 'BAFFLE'

I disagree.

The whole idea of ligatures like fi is purely typographic.  The crossbar 
on the "f" (at least in some fonts) runs into the dot on the "i".  
Likewise, the top curl on an "f" run into the serif on top of the "l" 
(and similarly for ffl).

There is no such thing as a "FFL" ligature, because the upper case 
letterforms don't run into each other like the lower case ones do.  
Thus, I would argue that it's wrong to say that calling upper() on an 
ffl ligature should yield FFL.

I would certainly expect, x.lower() == x.upper().lower(), to be True for 
all values of x over the set of valid unicode codepoints.  Having 
u"\uFB04".upper() ==> "FFL" breaks that.  I would also expect len(x) == 
len(x.upper()) to be True.



More information about the Python-list mailing list