[Python-3000] Pre-PEP: Easy Text File Decoding

Paul Prescod paul at prescod.net
Mon Sep 11 06:31:00 CEST 2006


On 9/10/06, David Hopwood <david.nospam.hopwood at blueyonder.co.uk> wrote:
> Josiah Carlson wrote:
> ... if you think that guessing based on content is a good idea -- I don't.
> In any case, such guessing necessarily depends on the expected file format,
> so it should be done by the application itself, or by a library that knows
> more about the format.

I disagree. If a non-trivial file can be decoded as a UTF-* encoding
it probably is that encoding. I don't see how it matters whether the
file represents Latex or an .htaccess file. XML is a special case
because it is specially designed to make encoding detection (not
guessing, but detection) easy.

> If the encoding of a text stream were settable after it had been opened,
> then it would be easy for anyone to implement whatever guessing algorithm
> they needed, without having to write an encoding implementation or include
> any other support for guessing in the I/O library itself.

But this defeats the whole purpose of the PEP which is to accelerate
the writing of quick and dirty text processing scripts.

 Paul Prescod


More information about the Python-3000 mailing list