Removing Unicode from Python?

John Roth newsgroups at jhrothjr.com
Sat Nov 1 20:19:24 EST 2003


"Irmen de Jong" <irmen at -NOSPAM-REMOVETHIS-xs4all.nl> wrote in message
news:3fa45528$0$58709$e4fe514c at news.xs4all.nl...
> Paul Rubin wrote:
> > Irmen de Jong <irmen at -NOSPAM-REMOVETHIS-xs4all.nl> writes:
> >
> >>While I think I am reasonably aware of things like Unicode,
> >>character encodings, and assorted related stuff, I still found that
> >>article highly interesting. Thanks for the link!
> >
> >
> > Actually I think the Wikipedia article on unicode is much better.
> > http://www.wikipedia.org/wiki/unicode
>
> I still like Joel's article better, partly because of his writing style
;-)
> The wiki article is very to-the-point. Joel's article is slightly funny
but
> still accurate.
>
> Or did I miss something-- does the wiki article address points Joel's
> article misses, or are there any mistakes in Joel's article?

Depends on what you're looking for. Joel's article, as he says up
front, is kind of lightweight if what you want is the technical facts.
What it does that the Wikipedia article doesn't do is pound on the
fact that if you don't know where that single byte string of data came
from, you don't know how it's encoded. And if you don't know how
it's encoded, then you can have real serious problems.

In fact, that's where this thread started. Aparently, the DB interface
the OP used is converting everything to Unicode strings based on
some unknown (to the poster, at least) set of assumptions, and it's
causing problems because those assumptions don't match the
semantics of the actual data.

John Roth

John Roth
>
> --Irmen
>






More information about the Python-list mailing list