Unicode is driving me nuts!

Sat Mar 13 03:35:45 EST 2004

Thank you, Skip.  You know what, I guess I'll give up
using unicode, as you also mentioned you used to have
headache with it.

I'll probably just read by bytes and check if the byte
is a Chinese character.  If it is, read 2 bytes
instead.  What do you think?  This way, I will
hopefully not to have a lot of unreadable characters.

--- Skip Montanaro <skip at pobox.com> wrote:
> 
>     Anthony> str = unicode(raw_str, myencoding)
> 
>     Anthony> This works just fine with a small
> sample Chinese document.
> 
>     Anthony> But when I attempted to run the script
> on the entire corpus, I
>     Anthony> get the typical "incomplete multibyte
> sequence error" or
>     Anthony> "UnicodeEncodeError: 'ascii' codec
> can't encode characters in
>     Anthony> position 0-23: ordinal not in
> range(128)"
> 
> Can you craft a small example which demonstrates the
> error but which you
> think is correctly encoded?
> 
>     Anthony> I am at my wit's end, so frustrated at
> handling
>     Anthony> non-ascii texts.
> 
> Unicode creates lots of problems for the
> uninitiated.  I pulled my hair out
> for a long time.  It took me a couple tries to get
> my system to work
> (more-or-less) with Unicode.  It's still got the
> occasional problem.
> 
> Skip
> 

__________________________________
Do you Yahoo!?
Yahoo! Mail - More reliable, more storage, less spam
http://mail.yahoo.com