Unicode BOM marks

"Martin v. Löwis" martin at v.loewis.de
Wed Mar 9 18:34:23 EST 2005


Steve Horsley wrote:
> It is my understanding that the BOM (U+feff) is actually the  Unicode 
> character "Non-breaking zero-width space". 

My understanding is that this used to be the case. According to

http://www.unicode.org/faq/utf_bom.html#38

the application should now specify specific processing, and both
simply dropping it, or reporting an error are both acceptable behaviour.
Applications that need the ZWNBSP behaviour (i.e. want to indicate that
there should be no break at this point) should use U+2060 (WORD JOINER).

Regards,
Martin



More information about the Python-list mailing list