Unicode BOM marks
"Martin v. Löwis"
martin at v.loewis.de
Wed Mar 9 18:34:23 EST 2005
Steve Horsley wrote:
> It is my understanding that the BOM (U+feff) is actually the Unicode
> character "Non-breaking zero-width space".
My understanding is that this used to be the case. According to
http://www.unicode.org/faq/utf_bom.html#38
the application should now specify specific processing, and both
simply dropping it, or reporting an error are both acceptable behaviour.
Applications that need the ZWNBSP behaviour (i.e. want to indicate that
there should be no break at this point) should use U+2060 (WORD JOINER).
Regards,
Martin
More information about the Python-list
mailing list