python encoding bug?

garabik-news-2005-05 at kassiopeia.juls.savba.sk garabik-news-2005-05 at kassiopeia.juls.savba.sk
Fri Dec 30 17:54:05 EST 2005


I was playing with python encodings and noticed this:

garabik at lancre:~$ python2.4
Python 2.4 (#2, Dec  3 2004, 17:59:05)
[GCC 3.3.5 (Debian 1:3.3.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> unicode('\x9d', 'iso8859_1')
u'\x9d'
>>>

U+009D is NOT a valid unicode character (it is not even a iso8859_1
valid character)

The same happens if I use 'latin-1' instead of 'iso8859_1'.

This caught me by surprise, since I was doing some heuristics guessing 
string encodings, and 'iso8859_1' gave no errors even if the input
encoding was different.

Is this a known behaviour, or I discovered a terrible unknown bug in python encoding
implementation that should be immediately reported and fixed? :-)


happy new year,

-- 
 -----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__    garabik @ kassiopeia.juls.savba.sk     |
 -----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!



More information about the Python-list mailing list