[Python-ideas] Python 3000 TIOBE -3%

Stephen J. Turnbull stephen at xemacs.org
Mon Feb 13 04:55:37 CET 2012


Carl M. Johnson writes:
 > On Feb 10, 2012, at 5:32 PM, Stephen J. Turnbull wrote:
 > 
 > > will founder on 'Óscar Fuentes' as author, unless you know what
 > > coding system is used, or know enough to use latin-1 (because
 > > it's effectively binary, not because it's the actual encoding).
 > 
 > Or just use errors="surrogateescape". I think we should tell people
 > who are scared of unicode and refuse to learn how to use it to just
 > add an errors="surrogateescape" keyword to their file open
 > arguments. Obviously, it's the wrong thing to do, but it's wrong in
 > the same way that Python 2 bytes are wrong, so if you're absolutely
 > committed to remaining ignorant of encodings, you can continue to
 > do that.

No, it's not the same as Python 2, and it's *subtly* the wrong thing
to do, too.  surrogateescape is intended to roundtrip on input from a
specific API to unchanged output to that same API, and that's all it
it is guaranteed to do.

Less pedantically, if you use latin-1, the internal representation is
valid Unicode but (partially) incorrect content.  No UnicodeErrors.
If you use errors="surrogateescape", any code that insists on valid
Unicode will crash.  Here I'm talking about a use case where the
user believes that as long as the ASCII content is correct they will
get correct output.

It's arguable that using errors="surrogateescape" is a better
approach, *because* of the possibility of a validity check.  I tend to
think not.  But that's a different argument from "same as Python 2".




More information about the Python-ideas mailing list