[Tutor] unicode and character sets

Kent Johnson kent37 at tds.net
Thu Aug 16 14:11:34 CEST 2007


tpc247 at gmail.com wrote:
> http://www.joelonsoftware.com/articles/Unicode.html
> 
> I realize the following: It does not make sense to have a string without 
> knowing what encoding it uses.  There is no such thing as plain text.

Good start!
> 
> Ok.  Fine.  In Mozilla, by clicking on View, Character Encoding, I find 
> out that the text in the file I grab from:
> 
> http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/index.html
> 
> is encoded in ISO-8859-1.  So I go about changing Python's default 
> encoding according to:
> 
> http://www.diveintopython.org/xml_processing/unicode.html

I don't think this is necessary. Did it actually fix anything? Changing 
the default encoding is not recommended because it makes your scripts 
non-portable.

> BUT the LATIN CAPITAL LETTER A WITH RING ABOVE character still displays 
> in IDLE as \xc5 !  I can get the character to display correctly if I type:
> 
> print "\xc5"

In many cases IDLE will display the repr() of a string which shows any 
non-ascii character as a hexidecimal escape. It is actually the correct 
character. print does not use the repr() so it displays correctly.

> which is fine if I am simply going to copy and paste the select element 
> into my html file.  However, I want to be able to dynamically generate 
> the html form page and have the character in question display correctly 
> in the web browser.
> 
> The problem, of course, is that if I run my script that creates the 
> select element in IDLE I continue to see the output:
> 
> <option value='AX'>\xc5land Islands</option>
> 
> Am I doing something wrong ?

No, actually you are doing great. This is correct output, it is just not 
displaying in the form you expect. The data is correct.

Kent


More information about the Tutor mailing list