Trouble saving unicode text to file

F. Petitjean littlejohn.75 at news.free.fr
Mon May 9 06:15:55 EDT 2005


Le Mon, 09 May 2005 08:39:40 +1000, John Machin a écrit :
> On Sun, 08 May 2005 19:49:42 +0200, "Martin v. Löwis"
><martin at v.loewis.de> wrote:
> 
>>John Machin wrote:
>>> Martin, I can't guess the reason for this last suggestion; why should
>>> a Windows system use iso-8859-1 instead of cp1252?
>>
>>Windows users often think that windows-1252 is the same thing as
>>iso-8859-1, and then exchange data in windows-1252, but declare them
>>as iso-8859-1 (in particular, this is common for HTML files).
>>iso-8859-1 is more portable than windows-1252, so it should be
>>preferred when the data need to be exchanged across systems.
> 
> 1. When exchanging data across systems, should not utf-8 be
> preferred???
> 
> 2. If the Windows *users* have been using characters that are in
> cp1252 but not in iso-8859-1, then attempting to convert to iso-8859-1
> will cause an exception. 
> 
>>>> euro_win = chr(128)
>>>> euro_uc = euro_win.decode('cp1252')
>>>> euro_uc
> u'\u20ac'
>>>> unicodedata.name(euro_uc)
> 'EURO SIGN'
>>>> euro_iso = euro_uc.encode('iso-8859-1')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeEncodeError: 'latin-1' codec can't encode character u'\u20ac'
> in position 0: ordinal not in range(256)
>>>>
> 
> I find it a bit hard to imagine that the euro sign wouldn't get a fair
> bit of usage in Swedish data processing even if it's not their own
> currency.
For western Europe countries, another codec exists which includes the
'EURO SIGN'. It is spelled 'iso8859_15' (with an alias 'iso-8859-15'
according to the 4.9.2 Standard Encodings page of the python library
reference).
euro_iso = euro_uc.encode('iso8859_15')
>>> euro_iso
'\xa4'
> 
> 3. How portable is a character set that doesn't include the euro sign?
I think it is due to historical constraints : isoLatin1 existed before
that the EURO SIGN appeared.
> 
> Regards,
> John



More information about the Python-list mailing list