[Python-ideas] Python 3000 TIOBE -3%

Tue Feb 14 09:02:16 CET 2012

Nick Coghlan writes:

 > I'd hazard a guess that the non-ASCII compatible encoding mostly
 > likely to be encountered outside Asia is UTF-16.

In other words, only people who insist on messing with
application/octet-stream files (like Word ;-).  They don't deserve the
pain, but they're gonna feel it anyway.

 > The choice is really between "never give me UnicodeErrors, but feel
 > free to silently corrupt the data stream if I do the wrong thing
 > with that data" (i.e.  "latin-1")

Yes.

 > and "correctly handle any ASCII compatible encoding, but still
 > throw UnicodeEncodeError if I'm about to emit corrupted data"
 > ("ascii+surrogateescape").

Not if I understand what ascii+surrogateescape would do correctly.
Yes, you can pass through verbatim, but AFAICS you would have to work
quite hard to do anything to that stream that would cause a
UnicodeError in your program, even though you corrupt it.  (Eg, delete
half of a multibyte EUC character.)

The question is what happens if you run into a validating processor
internally -- then you'll see an error (even though you're just
passing it through verbatim!)