codec for UTF-8 with BOM

Chris Rebert clp2 at rebertia.com
Mon May 2 05:47:45 EDT 2011


On Mon, May 2, 2011 at 1:34 AM, Ulrich Eckhardt
<ulrich.eckhardt at dominolaser.com> wrote:
> Hi!
>
> I want to write a file starting with the BOM and using UTF-8, and stumbled
> across some problems:
>
> 1. I would have expected one of the codecs to be 'UTF-8 with BOM' or
> something like that, but I can't find the correct name. Also, I can't find a
> way to get a list of the supported codecs at all, which strikes me as odd.

If nothing else, there's
http://docs.python.org/library/codecs.html#standard-encodings

The correct name, as you found below and as is corroborated by the
webpage, seems to be "utf_8_sig":
>>> u"FOøbar".encode('utf_8_sig')
'\xef\xbb\xbfFO\xc3\xb8bar'

This could definitely be documented more straightforwardly.

<snip>
> 3. The docs mention encodings.utf_8_sig, available since 2.5, but I can't
> locate that thing there either. What's going on here?

Works for me™:
Python 2.6.6 (r266:84292, Jan 12 2011, 13:35:00)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from encodings import utf_8_sig
>>>

Cheers,
Chris
--
http://rebertia.com



More information about the Python-list mailing list