a question about Chinese characters in a Python Program

Ben Finney bignose+hates-spam at benfinney.id.au
Tue Oct 21 20:07:36 EDT 2008


John Machin <sjmachin at lexicon.net> writes:

> On Oct 21, 11:03 pm, Ben Finney <bignose+hates-s... at benfinney.id.au>
> wrote:
> > John Machin <sjmac... at lexicon.net> writes:
> > > I don't understand the point or value of filtering out all byte values
> > > greater than 127
> >
> > That's only done if the encoding isn't otherwise specified. In which
> > case, ASCII is the documented default encoding. In which case, it
> > *must* be restricted to code points 0+IBM-127, otherwise it's not ASCII.
> >
> > The value of doing this is to make it rapidly and repeatably apparent
> > when the programmer's assumptions about character encoding are false,
> > allowing the programming error to be fixed early rather than late.
> 
> "make it rapidly and repeatably apparent ..." is much better achieved
> by raising an exception.

Ah, I misread; I thought you were asking about the value of defaulting
to ASCII and therefore raising an exception. It seems we agree on
that, then.

> What is that 0+IBM-127 +IBw-guess+IB0- gibberish in your posting?

It wasn't in my message as sent to my news server, nor as I read the
message in comp.lang.python. The message was encoded using UTF-8.
Perhaps it's since been munged in transit to your eyeballs by any of a
number of intermediaries.

-- 
 \       “I bought some batteries, but they weren't included; so I had |
  `\                                to buy them again.” —Steven Wright |
_o__)                                                                  |
Ben Finney



More information about the Python-list mailing list