[Python-3000] encoding='guess' ?
Antoine Pitrou
solipsis at pitrou.net
Sun Sep 10 15:21:14 CEST 2006
Hi,
Let me add that 'guess' should probably be forbidden as an encoding
parameter (instead, a separate function argument should be used as in my
proposal).
Here is a schematic example to show why :
def append_text(filename, encoding):
src = textfile(filename, "r", encoding)
my_text = src.read()
src.close()
dst = textfile("textlist.txt", "r+", encoding)
dst.seek_end(0)
dst.write(my_text + "\n")
dst.close()
With Paul's current proposal three cases can arise :
- "encoding" is a real encoding name like iso-8859-1 or utf-8. There
should be no problems, since we assume this encoding has been configured
once and for all in the application.
- "encoding" is either "site" or "locale". This should result in the
same value run after run, since we assume the site or locale encoding
value has been configured once and for all.
- "encoding" is "guess". In this case anything can happen. A possible
occurence is that for the first file, it will result in utf-8 being
detected (or Shift-JIS, or whatever), and for the second file it will be
iso-8859-1. This will lead to a crash in the likely case that some
characters in the source file can't be represented using the character
encoding auto-detected for the destination file.
Yet the append_text() function does look correct, doesn't it?
We shouldn't hide a contextual encoding-detection algorithm under an
encoding name. It leads to semantic uncertainty.
Regards
Antoine.
More information about the Python-3000
mailing list