python encoding bug?

Vincent Wehren vincent at visualtrans.de
Sat Dec 31 05:14:15 EST 2005


<garabik-news-2005-05 at kassiopeia.juls.savba.sk> wrote in message 
news:dp4dqd$230e$1 at ns.felk.cvut.cz...
|
| I was playing with python encodings and noticed this:
|
| garabik at lancre:~$ python2.4
| Python 2.4 (#2, Dec  3 2004, 17:59:05)
| [GCC 3.3.5 (Debian 1:3.3.5-2)] on linux2
| Type "help", "copyright", "credits" or "license" for more information.
| >>> unicode('\x9d', 'iso8859_1')
| u'\x9d'
| >>>
|
| U+009D is NOT a valid unicode character (it is not even a iso8859_1
| valid character)

That statement is not entirely true. If you check the current 
UnicodeData.txt (on http://www.unicode.org/Public/UNIDATA/)  you'll find:

009D;<control>;Cc;0;BN;;;;;N;OPERATING SYSTEM COMMAND;;;;

Regards,

Vincent Wehren

|
| The same happens if I use 'latin-1' instead of 'iso8859_1'.
|
| This caught me by surprise, since I was doing some heuristics guessing
| string encodings, and 'iso8859_1' gave no errors even if the input
| encoding was different.
|
| Is this a known behaviour, or I discovered a terrible unknown bug in 
python encoding
| implementation that should be immediately reported and fixed? :-)
|
|
| happy new year,
|
| -- 
| -----------------------------------------------------------
|| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
|| __..--^^^--..__    garabik @ kassiopeia.juls.savba.sk     |
| -----------------------------------------------------------
| Antivirus alert: file .signature infected by signature virus.
| Hi! I'm a signature virus! Copy me into your signature file to help me 
spread! 





More information about the Python-list mailing list