mysterious unicode
jim-on-linux
inq1ltd at verizon.net
Tue Mar 20 22:31:00 EDT 2007
On Tuesday 20 March 2007 21:17, Carsten Haese
wrote:
> On Tue, 2007-03-20 at 20:26 -0400, jim-on-linux
wrote:
> > I have been getting the same thing using
> > SQLite3 when extracting data fron an SQLite3
> > database.
>
> Many APIs that exchange data choose to exchange
> text in Unicode because that eliminates
> encoding uncertainty. Whether an API uses
> Unicode would probably be noted somewhere in
> its documentation.
>
> > I take the database info which is in a list
> > and do
> >
> > name = str.record[0]
>
> You probably mean str(record[0]) .
Yes,
>
> > rather than
> > name = record[0]
> >
> > So far, I havn't had any problems.
> > For some reason the unicode u is removed.
> > I havn't wanted to spend the time to figure
> > out why.
>
> As a software engineer, I'd get worried if I
> didn't know why the code I wrote works. Maybe
> that's just me.
I don't disagree, but sometime depending on the
situation, time to investigate is a luxury.
However,
( If you don't have the time to do it right the
first time when will you have the time to fix
it.)
>
> Unicode is not rocket science. I suggest you
> read http://www.amk.ca/python/howto/unicode to
> demystify what Unicode objects are and do.
>
> With str(), you're asking the Unicode object
> for its byte string interpretation, which
> causes the Unicode object to give you its
> encoding in the system default encoding. The
> default encoding is normally ascii. That can be
> tweaked for your particular Python
> installation, but if you need an encoding other
> than ascii it's recommended that you explicitly
> encode and decode from and to Unicode, lest you
> risk writing non-portable code.
>
> Using str() coercion of Unicode objects will
> work well enough until you run into a string
> that contains characters that can't be
> represented in the default encoding.
Right,
even though None or null are not strings they are
common enough to cause a problem.
Try to run a loop through a list with None or
null in it.
Example,
x = str(list[2])
when list[2] = null or None, problems.
Easy to fix but more work.
I'll check the web site out.
Thanks for the update,
Jim-on-linux
> Once that
> happens, you're better off explicitly encoding
> the Unicode object into a well-defined encoding
> on input, or, even better, just work with
> Unicode objects internally and only encode to
> byte strings when absolutely necessary, such as
> when outputting to a file or to the console.
>
> Hope this helps,
>
> Carsten.
More information about the Python-list
mailing list