Win32com and Unicode

Sun Jan 6 11:20:43 EST 2002

Tim Roberts <timr at probo.com> writes:

> One of the fields in one of my tables is a last name.  One of the last
> names has an accented character (e with acute accent).  When I attempt to
> read the contents of that field, I get an exception from the __str__
> handler in the Field class generated by the Pythonwin COM Makepy utility:
> it complains  that the conversion from Unicode failed because one of the
> characters was greater than 127.

What part of the code triggers the __str__ invocation? Assuming you
have code like

  lastname = str(foo.lastname)

I recommend to replace that with

  lastname = unicode(foo.lastname)

win32com will retrieve a Unicode string first, and the str call
normally converts that. It is much better to receive the Unicode
object as-is, and delay conversion to a byte string until you need to
output the string.

> I can believe this, but I don't know how to fix it.  In Windows parlance,
> is there a way I can register a "code page" with Python so it knows how to
> convert characters beyond the lower 128?

If you have a Unicode object, you don't need to register anything. Just do

  lastname = lastname.encode("cp1252")

if you want the data in code page 1252 (say). Again, I'd recommend to
delay conversion to byte strings until the very last moment,
i.e. immediately before output. You may find that outputting them
works fine with Unicode objects (e.g. if outputting them to a Win32
window), or requires a different encoding (e.g. UTF-8 instead of
cp1252 if outputting to an XHTML file).

HTH,
Martin