[ python-Bugs-1026480 ] iso-latin-1 strings and functions lower & upper

SourceForge.net noreply at sourceforge.net
Mon Sep 13 22:00:38 CEST 2004


Bugs item #1026480, was opened at 2004-09-11 21:28
Message generated for change (Comment added) made by scott_daniels
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1026480&group_id=5470

Category: None
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Tomasz Kowaltowski (kowaltowski)
Assigned to: Nobody/Anonymous (nobody)
Summary: iso-latin-1 strings and functions lower & upper

Initial Comment:
I have no problems in Python in using strings which
contain accented letters (my Emacs has no problems in
producing them using one-byte iso-8859-1 encoding).
However functions 'lower' and 'upper' do not work
properly on these letters as shown below (I hope all
accents appear properly within your browsers):

-------------------------------------------------------------
as = "aáàâãä"      # except for the first 'a', all
other have accents
AS = "AÁÀÂÃÄ"      # except for the first 'A', all
other have accents
print "direct: %s -- %s" % (as, AS)
print "lower:  %s -- %s" % (as.lower(), AS.lower())
print "upper:  %s -- %s" % (as.upper(), AS.upper())
-------------------------------------------------------------

The output is:
--------------------------------------------------------------
direct: aáàâãä -- AÁÀÂÃÄ
lower:  aáàâãä -- aÁÀÂÃÄ
upper:  Aáàâãä -- AÁÀÂÃÄ
--------------------------------------------------------------

i.e., accented letters (above 128) are not translated.
It did not make any difference to put the line 

# -*- coding: iso-latin-1 -*-

about the encoding as recommended by PEP 0263.

I am not sure whether this is a bug or it is
intentional, i.e. these functions work only for pure
ASCII letters. However it is a major inconvenience for
those who use any language which is not English but
uses the Latin aplphabet :-(. 

There should be some mechanism to signal these
functions which Latin variant (iso-8859-1, iso-8859-2,
...) is being used, so that they behave properly; eg,
optional second argument?

----------------------------------------------------------------------

Comment By: Scott David Daniels (scott_daniels)
Date: 2004-09-13 20:00

Message:
Logged In: YES 
user_id=493818

Note: lower and upper are defined as for ASCII on strs, 
but works correctly for unicode.
 uas = u"aáàâãä" # except first 'a', all have accents
 UAS = u"AÁÀÂÃÄ" # except first 'A', all have accents
 print u"direct: %s -- %s" % (uas, UAS)
 print u"lower: %s -- %s" % (uas.lower(), UAS.lower())
 print u"upper: %s -- %s" % (uas.upper(), UAS.upper())

What you are asking is pretty hopeless.  With two 
modules loaded with differing encodings, whose idea of 
"how to uppercase an 8-bit character" should be used?

What you might want to use is:
  def codedupper(coding, string):
     return string.decode(coding).upper().encode(coding)
  def codedlower(coding, string):
     return string.decode(coding).lower().encode(coding)
or:
  def latinupper(string):
     return string.decode('latin-1').upper().encode('latin-1')
  def latinlower(string):
     return string.decode('latin-1').lower().encode('latin-1')

Any of these functions is well-defined even with several 
modules of differing encodings loaded.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1026480&group_id=5470


More information about the Python-bugs-list mailing list