what to do with multiple BOMs

Robin Becker robin at reportlab.com
Thu Aug 19 09:07:43 EDT 2021

Channeling unicode text experts and xml people:

I have xml entity with initial bytes ff fe ff fe which the file command says is
UTF-16, little-endian text.

I agree, but what should be done about the additional BOM.

A test output made many years ago seems to keep the extra BOM. The xml context is

xml file 014.xml
<!DOCTYPE doc [
<!ENTITY e SYSTEM "014.ent">

the entitity file 014.ent is bombomdata


The old saved test output of processing is


which implies seems as though the extra BOM in the entity has been kept and processed into a different BOM meaning utf8.

I think the test file is wrong and that multiple BOM chars in the entiry should have been removed.

Am I right?
Robin Becker

More information about the Python-list mailing list