adodbapi / string encoding problem

Peter Otten __peter__ at web.de
Thu Sep 25 10:28:44 EDT 2003


Achim Domma wrote:

>> You have to know the encoding of the original file.
> 
> Why? It's of type 'str' and I would expect that I could write it to DB and
> get the same 'str' back. That's all I want. Why is it required do know the
> encoding?

str is essentially a sequence of bytes that can store the same content in
different ways:

>>> utf8 = u"ä".encode("utf8")
>>> latin = u"ä".encode("latin1")
>>> latin
'\xe4'
>>> utf8
'\xc3\xa4'
>>>

Now imagine you store the latter byte sequence in your database and want to
display it in your windows editor

>>> print utf8
À
(you should see two strange characters)

I had this problem occasionally when I edited python scripts with idle and,
oddly enough, my old c++ builder 3 ide. 

To avoid such ambiguities, unicode is introduced. Now I guess that the first
conversion, when your string data is fed to the db api, is performed
automatically using the default encoding of your environment, which may
differ from the encoding of the downloaded file, thus probably messing up
some characters.

Of course you could store the file in binary form (not in a memo field) in
your db and thus bypass all encoding mechanisms, but if you still think
that a string is a string is a string, you should reread the above or
go for more detailed information on the matter.

Peter






More information about the Python-list mailing list