Unicode is driving me nuts!
Anthony Liu
antonyliu2002 at yahoo.com
Sat Mar 13 03:35:45 EST 2004
Thank you, Skip. You know what, I guess I'll give up
using unicode, as you also mentioned you used to have
headache with it.
I'll probably just read by bytes and check if the byte
is a Chinese character. If it is, read 2 bytes
instead. What do you think? This way, I will
hopefully not to have a lot of unreadable characters.
--- Skip Montanaro <skip at pobox.com> wrote:
>
> Anthony> str = unicode(raw_str, myencoding)
>
> Anthony> This works just fine with a small
> sample Chinese document.
>
> Anthony> But when I attempted to run the script
> on the entire corpus, I
> Anthony> get the typical "incomplete multibyte
> sequence error" or
> Anthony> "UnicodeEncodeError: 'ascii' codec
> can't encode characters in
> Anthony> position 0-23: ordinal not in
> range(128)"
>
> Can you craft a small example which demonstrates the
> error but which you
> think is correctly encoded?
>
> Anthony> I am at my wit's end, so frustrated at
> handling
> Anthony> non-ascii texts.
>
> Unicode creates lots of problems for the
> uninitiated. I pulled my hair out
> for a long time. It took me a couple tries to get
> my system to work
> (more-or-less) with Unicode. It's still got the
> occasional problem.
>
> Skip
>
__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com
More information about the Python-list
mailing list