Python and UTF-8

Dave Pawson DaveP at dpawsonNOSPam.freeserve.co.uk
Fri Jan 4 13:45:03 EST 2002


martin at v.loewis.de (Martin v. Loewis) wrote in 
news:m3itak5to0.fsf at mira.informatik.hu-berlin.de:

> Brandvik <tmagna at online.no> writes:
> 
>> Is it possible to make a python script that would change the character
>> to UTF-8 no matter what the encoding of the input is? I have heard
>> that Python has some great functions for Unicode formatting so this
>> might be an easy and trivial task, but I'm new to Python so I really
>> don't know... 
> 
> You have to know the encoding the data is currently, say
> current_encoding. Then, converting it into UTF-8, you write
> 
> data = unicode(data, current_encoding).encode('utf-8')

If, having a file with 8859-1 encodings, can I use the same
approach?


This prior to xslt processing, with older html files
originating in Scandanavia, which blow up when XSLT
gets hold of them with no encoding specified!

I figured out how to add the encoding, but a utf-8 input
would make it far easier!

Regards DaveP



More information about the Python-list mailing list