Unicode/utf-8 data in SQL Server

Wed Aug 9 05:20:32 EDT 2006

Laurent Pointal wrote:
> John Machin a écrit :
> > The customer should be very happy if you do
> > text.decode('utf-8').encode('cp1252') -- not only should the file
> > import into Excel OK, he should be able to view it in
> > Word/Notepad/whatever.
>
> +
> text.decode('utf-8').encode('cp1252',errors='replace')
>
> As cp1252 may not cover all utf8 chars.

In that case, the OP may well want to use 'xmlcharrefreplace' or
'backslashreplace' as they stand out more than '?' *and* the original
Unicode is recoverable if necessary e.g.:

#>>> msg = u'\u0124\u0114\u0139\u013B\u0150'
>>> print msg
HELLO
#>>> msg.encode('cp1252', 'replace')
'?????'
#>>> msg.encode('cp1252', 'xmlcharrefreplace')
'ĤĔĹĻŐ'
#>>> msg.encode('cp1252', 'backslashreplace')
'\\u0124\\u0114\\u0139\\u013b\\u0150'
#>>> 

Cheers,
John