[Tutor] While loop issue, variable not equal to var or var

Steven D'Aprano steve at pearwood.info
Sat Jul 12 14:19:50 CEST 2014


On Sat, Jul 12, 2014 at 11:27:17AM +0100, Alan Gauld wrote:
> On 12/07/14 10:28, Steven D'Aprano wrote:
> 
> >If you're using Python 3.3 or higher, it is better to use
> >message.casefold rather than lower. For English, there's no real
> >difference:
> >...
> >but it can make a difference for non-English languages:
> >
> >py> "Große".lower()  # German for "great" or "large"
> >'große'
> >py> "Große".casefold()
> >'grosse'
> 
> You learn something new etc...
> 
> But I'm trying to figure out what difference this makes in
> practice?
> 
> If you were targeting a German audience wouldn't you just test
> against the German alphabet? After all you still have to expect 'grosse' 
> which isn't English, so if you know to expect grosse
> why not just test against große instead?

Because the person might have typed any of:

grosse
GROSSE
gROSSE
große
Große
GROßE
GROẞE

etc., and you want to accept them all, just like in English you'd want 
to accept any of GREAT great gREAT Great gReAt etc. Hence you want to 
fold everything to a single, known, canonical version. Case-fold will do 
that, while lowercasing won't.

(The last example includes a character which might not be visible to 
many people, since it is quite unusual and not supported by many fonts 
yet. If it looks like a box or empty space for you, it is supposed 
to be capital sharp-s, matching the small sharp-s ß.)


Oh, here's another example of the difference, this one from Greek:

py> 'Σσς'.lower()  # three versions of sigma
'σσς'
py> 'Σσς'.upper()
'ΣΣΣ'
py> 'Σσς'.casefold()
'σσσ'


I suspect that there probably aren't a large number of languages where 
casefold and lower do something different, since most languages don't 
have distinguish between upper and lower case at all. But there's no 
harm in using it, since at worst it returns the same as lower().


-- 
Steven


More information about the Tutor mailing list