a question about Chinese characters in a Python Program

Ben Finney bignose+hates-spam at benfinney.id.au
Tue Oct 21 08:03:18 EDT 2008


John Machin <sjmachin at lexicon.net> writes:

> I don't understand the point or value of filtering out all byte values
> greater than 127

That's only done if the encoding isn't otherwise specified. In which
case, ASCII is the documented default encoding. In which case, it
*must* be restricted to code points 0–127, otherwise it's not ASCII.

The value of doing this is to make it rapidly and repeatably apparent
when the programmer's assumptions about character encoding are false,
allowing the programming error to be fixed early rather than late.
This is, in my estimation, of more value than heuristic magic to
“guess” the encoding, and the resultant debugging nightmare when
that guesswork fails in unpredictable ways later in the program's
life.

-- 
 \         “My girlfriend has a queen sized bed; I have a court jester |
  `\   sized bed. It's red and green and has bells on it, and the ends |
_o__)                                         curl up.” —Steven Wright |
Ben Finney



More information about the Python-list mailing list