Problem with national characters

Leif B. Kristensen abuse at solumslekt.org
Thu Mar 31 12:23:55 EST 2005


I'm developing a routine that will parse user input. For simplicity, I'm
converting the entire input string to upper case. One of the words that
will have special meaning for the parser is the word "før", (before in
English). However, this word is not recognized. A test in the
interactive shell reveals this:

leif at balapapa leif $ python
Python 2.3.4 (#1, Feb  7 2005, 21:31:38)
[GCC 3.3.5  (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3, pie-8.7.7.1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 'før'.upper()
'F\xf8R'
>>> 'FØR'
'F\xd8R'
>>>

In Windows, the result is slightly different, but no better:

C:\Python23>python
ActivePython 2.3.2 Build 232 (ActiveState Corp.) based on
Python 2.3.2 (#49, Nov 13 2003, 10:34:54) [MSC v.1200 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> 'før'.upper()
'F\x9bR'
>>> 'FØR'
'F\x9dR'
>>>

Is there a way around this problem? My character set in Linux is
ISO-8859-1. In Windows 2000 it should be the equivavent Latin-1, though
I'm not sure about which character set the command shell is using.
-- 
Leif Biberg Kristensen
http://solumslekt.org/



More information about the Python-list mailing list