[Python-Dev] Improve open() to support reading file starting with an unicode BOM

Tres Seaver tseaver at palladion.com
Fri Jan 8 22:59:04 CET 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Eric Smith wrote:
>>> Shouldn't this encoding guessing be a separate function that you call
>>> on either a file or a seekable stream ?
>>>
>>> After all, detecting encodings is just as useful to have for non-file
>>> streams.
>> Other stream sources typically have out-of-band ways to signal the
>> encoding:  only when reading from the filesystem do we pretty much
>> *have* to guess, and in that case the BOM / signature is the best
>> heuristic we have.  Also, some non-file streams are not seekable, and so
>> can't be guessed via a pre-pass.
> 
> But what if the file were in (for example) a zip file? I think you
> definitely want to have access to this functionality outside of open().

If the application expects a possibly-BOM-signature-marked file, but you
pass it mismatched garbage:

  >>> f = open('some.zip', encoding='BOM")

the error handling should be the same as if you passed any other
mismatched encoding:

  >>> f = open('some.zip', encoding='UTF8')

i.e., you discover the error when you try to read from the (non)encoded
stream, not when you open it.


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktHqpwACgkQ+gerLs4ltQ7uAACeKEc+WT4TASGcVl1Hfqe6L9La
I6EAn1pJtngtLWPdothGbYB+zUabEvTW
=TjBK
-----END PGP SIGNATURE-----




More information about the Python-Dev mailing list