mysterious unicode

jim-on-linux inq1ltd at verizon.net
Tue Mar 20 22:31:00 EDT 2007


On Tuesday 20 March 2007 21:17, Carsten Haese 
wrote:
> On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux 
wrote:
> > I have been getting the same thing using
> > SQLite3 when extracting data fron an SQLite3
> > database.
>
> Many APIs that exchange data choose to exchange
> text in Unicode because that eliminates
> encoding uncertainty. Whether an API uses
> Unicode would probably be noted somewhere in
> its documentation.
>
> >  I take the database info which is in a list
> > and do
> >
> > name = str.record[0]
>
> You probably mean str(record[0]) .

Yes, 


>
> > rather than
> > name = record[0]
> >
> > So far, I havn't had any problems.
> > For some reason the unicode u is removed.
> > I havn't wanted to spend the time to figure
> > out why.
>
> As a software engineer, I'd get worried if I
> didn't know why the code I wrote works. Maybe
> that's just me.

I don't disagree, but sometime depending on the 
situation, time to investigate is a luxury.
However, 
( If you don't have the time to do it right the 
first time when will you have the time to fix 
it.)

>
> Unicode is not rocket science. I suggest you
> read http://www.amk.ca/python/howto/unicode to
> demystify what Unicode objects are and do.
>
> With str(), you're asking the Unicode object
> for its byte string interpretation, which
> causes the Unicode object to give you its
> encoding in the system default encoding. The
> default encoding is normally ascii. That can be
> tweaked for your particular Python
> installation, but if you need an encoding other
> than ascii it's recommended that you explicitly
> encode and decode from and to Unicode, lest you
> risk writing non-portable code.
>
> Using str() coercion of Unicode objects will
> work well enough until you run into a string
> that contains characters that can't be
> represented in the default encoding. 
Right,
even though None or null are not strings they are 
common enough to cause a problem.
Try to run a loop through a list with None  or 
null in it. 
Example,
x = str(list[2]) 
when list[2] = null or None, problems.  
Easy to fix but more work.

I'll check the web site out.

Thanks for the update,
Jim-on-linux

> Once that 
> happens, you're better off explicitly encoding
> the Unicode object into a well-defined encoding
> on input, or, even better, just work with
> Unicode objects internally and only encode to
> byte strings when absolutely necessary, such as
> when outputting to a file or to the console.
>
> Hope this helps,
>
> Carsten.



More information about the Python-list mailing list