Unicode (UTF8) in dbhas on 2.5

Joe Strout joe at strout.net
Tue Oct 21 19:26:15 EDT 2008


On Oct 21, 2008, at 2:39 PM, Martin v. Löwis wrote:

> It's not possible to "fix" this - it isn't even broken. The *db  
> modules,
> by design, support storing of arbitrary bytes, not just character  
> data.

Many database engines are encoding-aware, and distinguish between  
'text' columns and 'blob' columns -- the latter are arbitrary bags of  
bytes, but text columns store text, and a good database (with a  
sensibly designed database) will be aware of this and handle encoding  
and decoding of text responsibly.

I can tell you that in REALbasic, if your database is properly  
configured to use UTF-8 encoding, the rest is all handled seamlessly  
-- you just store and retrieve text, and don't have to worry about  
encoding and decoding things all over the place.

So the OP's request is quite valid.  Python's handling of encodings is  
currently primitive compared to some other environments, and I see  
that this extends to the database modules.  Fine, fair enough, it is  
what it is, but there is no harm in asking about (or even yearning  
for) a more intelligent system that does more of the grunt work for us.

Best,
- Joe




More information about the Python-list mailing list