a question about Chinese characters in a Python Program

est electronixtar at gmail.com
Mon Oct 20 12:44:53 EDT 2008


On Oct 20, 11:46 pm, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Mon, 20 Oct 2008 06:30:09 -0700, est wrote:
> > Like I said, str() should NOT throw an exception BY DESIGN, it's a basic
> > language standard.
>
> int() is also a basic language standard, but it is perfectly acceptable
> for int() to raise an exception if you ask it to convert something into
> an integer that can't be converted:
>
> int("cat")
>
> What else would you expect int() to do but raise an exception?
>
> If you ask str() to convert something into a string which can't be
> converted, then what else should it do other than raise an exception?
> Whatever answer you give, somebody else will argue it should do another
> thing. Maybe I want failed characters replaced with '?'. Maybe Fred wants
> failed characters deleted altogether. Susan wants UTF-16. George wants
> Latin-1.
>
> The simple fact is that there is no 1:1 mapping from all 65,000+ Unicode
> characters to the 256 bytes used by byte strings, so there *must* be an
> encoding, otherwise you don't know which characters map to which bytes.
>
> ASCII has the advantage of being the lowest common denominator. Perhaps
> it doesn't make too many people very happy, but it makes everyone equally
> unhappy.
>
> > str() is not only a convert to string function, but
> > also a serialization in most cases.(e.g. socket) My simple suggestion
> > is: If it's a unicode character, output as UTF-8;
>
> Why UTF-8? That will never do. I want it output as UCS-4.
>
> > other wise just ouput
> > byte array, please do not encode it with really stupid range(128) ASCII.
> > It's not guessing, it's totally wrong.
>
> If you start with a byte string, you can always get a byte string:
>
> >>> s = '\x96 \xa0 \xaa'  # not ASCII characters
> >>> s
> '\x96 \xa0 \xaa'
> >>> str(s)
>
> '\x96 \xa0 \xaa'
>
> --
> Steven

In fact Python handles characters well than most other open-source
programming languages. But still:

1. You can explain str() in 1000 ways, there are 1001 more confusing
error on all kinds of python apps. (Not only some of the scripts I've
written, but also famous enough apps like Boa Constructor
http://i36.tinypic.com/1gqekh.jpg. This sucks hard, right?)


2. Anyone please kindly tell me how can I define a customized encoding
(namely 'ansi') which handles range(256) so I can
sys.setdefaultencoding('ansi') once and for all?



More information about the Python-list mailing list