mysterious unicode

Carsten Haese carsten at uniqsys.com
Tue Mar 20 21:17:40 EDT 2007


On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux wrote:
> I have been getting the same thing using SQLite3 
> when extracting data fron an SQLite3 database.

Many APIs that exchange data choose to exchange text in Unicode because
that eliminates encoding uncertainty. Whether an API uses Unicode would
probably be noted somewhere in its documentation.

>  I take the database info which is in a list and do
> 
> name = str.record[0]

You probably mean str(record[0]) .

> rather than 
> name = record[0]
> 
> So far, I havn't had any problems.
> For some reason the unicode u is removed.
> I havn't wanted to spend the time to figure out 
> why.

As a software engineer, I'd get worried if I didn't know why the code I
wrote works. Maybe that's just me.

Unicode is not rocket science. I suggest you read
http://www.amk.ca/python/howto/unicode to demystify what Unicode objects
are and do.

With str(), you're asking the Unicode object for its byte string
interpretation, which causes the Unicode object to give you its encoding
in the system default encoding. The default encoding is normally ascii.
That can be tweaked for your particular Python installation, but if you
need an encoding other than ascii it's recommended that you explicitly
encode and decode from and to Unicode, lest you risk writing
non-portable code.

Using str() coercion of Unicode objects will work well enough until you
run into a string that contains characters that can't be represented in
the default encoding. Once that happens, you're better off explicitly
encoding the Unicode object into a well-defined encoding on input, or,
even better, just work with Unicode objects internally and only encode
to byte strings when absolutely necessary, such as when outputting to a
file or to the console.

Hope this helps,

Carsten.





More information about the Python-list mailing list