[Python-ideas] Python 3000 TIOBE -3%
Stephen J. Turnbull
stephen at xemacs.org
Mon Feb 13 04:55:37 CET 2012
Carl M. Johnson writes:
> On Feb 10, 2012, at 5:32 PM, Stephen J. Turnbull wrote:
>
> > will founder on 'Óscar Fuentes' as author, unless you know what
> > coding system is used, or know enough to use latin-1 (because
> > it's effectively binary, not because it's the actual encoding).
>
> Or just use errors="surrogateescape". I think we should tell people
> who are scared of unicode and refuse to learn how to use it to just
> add an errors="surrogateescape" keyword to their file open
> arguments. Obviously, it's the wrong thing to do, but it's wrong in
> the same way that Python 2 bytes are wrong, so if you're absolutely
> committed to remaining ignorant of encodings, you can continue to
> do that.
No, it's not the same as Python 2, and it's *subtly* the wrong thing
to do, too. surrogateescape is intended to roundtrip on input from a
specific API to unchanged output to that same API, and that's all it
it is guaranteed to do.
Less pedantically, if you use latin-1, the internal representation is
valid Unicode but (partially) incorrect content. No UnicodeErrors.
If you use errors="surrogateescape", any code that insists on valid
Unicode will crash. Here I'm talking about a use case where the
user believes that as long as the ASCII content is correct they will
get correct output.
It's arguable that using errors="surrogateescape" is a better
approach, *because* of the possibility of a validity check. I tend to
think not. But that's a different argument from "same as Python 2".
More information about the Python-ideas
mailing list