[Python-Dev] Improve open() to support reading file starting with an unicode BOM

Michael Foord fuzzyman at voidspace.org.uk
Sun Jan 10 00:25:18 CET 2010

On 09/01/2010 22:14, Lennart Regebro wrote:
> On Sat, Jan 9, 2010 at 21:28, Antoine Pitrou<solipsis at pitrou.net>  wrote:
>> If we want it to be the default, it must be able to fallback on the current
>> locale-based algorithm if no BOM is found. I don't think it would be easy for a
>> codec to do that.
> Right. It seems like encoding=None is the right way to go there.
> encoding='BOM' would probably only work if 'BOM' isn't an encoding but
> a special tag, which is ugly.
I would rather see it as the default behavior for open without an 
encoding specified.

I know Guido has expressed a preference against this so I won't continue 
to flog it.

The current behavior however is that we have a 'guessing' algorithm 
based on the platform default. Currently if you open a text file in read 
mode that has a UTF-8 signature, but the platform default is something 
other than UTF-8, then we open the file using what is likely to be the 
incorrect encoding. Looking for the signature seems to be better 
behaviour in that case.

All the best,



More information about the Python-Dev mailing list