[Python-ideas] Python 3000 TIOBE -3%
Stephen J. Turnbull
stephen at xemacs.org
Tue Feb 14 09:02:16 CET 2012
Nick Coghlan writes:
> I'd hazard a guess that the non-ASCII compatible encoding mostly
> likely to be encountered outside Asia is UTF-16.
In other words, only people who insist on messing with
application/octet-stream files (like Word ;-). They don't deserve the
pain, but they're gonna feel it anyway.
> The choice is really between "never give me UnicodeErrors, but feel
> free to silently corrupt the data stream if I do the wrong thing
> with that data" (i.e. "latin-1")
Yes.
> and "correctly handle any ASCII compatible encoding, but still
> throw UnicodeEncodeError if I'm about to emit corrupted data"
> ("ascii+surrogateescape").
Not if I understand what ascii+surrogateescape would do correctly.
Yes, you can pass through verbatim, but AFAICS you would have to work
quite hard to do anything to that stream that would cause a
UnicodeError in your program, even though you corrupt it. (Eg, delete
half of a multibyte EUC character.)
The question is what happens if you run into a validating processor
internally -- then you'll see an error (even though you're just
passing it through verbatim!)
More information about the Python-ideas
mailing list