Is there a way to get utf-8 out of a Unicode string?

thebjorn BjornSteinarFjeldPettersen at gmail.com
Mon Oct 30 02:24:48 EST 2006


I've got a database (ms sqlserver) that's (way) out of my control,
where someone has stored utf-8 encoded Unicode data in regular varchar
fields, so that e.g. the string 'Blåbærsyltetøy' is in the database
as 'Bl\xc3\xa5b\xc3\xa6rsyltet\xc3\xb8y' :-/

Then I read the data out using adodbapi (which returns all strings as
Unicode) and I get u'Bl\xc3\xa5b\xc3\xa6rsyltet\xc3\xb8y'. I couldn't
find any way to get back to the original short of:

  def unfk(s):
      return eval(repr(s)[1:]).decode('utf-8')

i.e. chopping off the u in the repr of a unicode string, and relying on
eval to interpret the \xHH sequences.

Is there a less hack'ish way to do this?

-- bjorn




More information about the Python-list mailing list