Unicode BOM marks

Steve Horsley shoot at the.moon
Sun Mar 13 19:19:07 EST 2005


Martin v. Löwis wrote:
> Steve Horsley wrote:
> 
>> It is my understanding that the BOM (U+feff) is actually the  Unicode 
>> character "Non-breaking zero-width space". 
> 
> 
> My understanding is that this used to be the case. According to
> 
> http://www.unicode.org/faq/utf_bom.html#38
> 
> the application should now specify specific processing, and both
> simply dropping it, or reporting an error are both acceptable behaviour.
> Applications that need the ZWNBSP behaviour (i.e. want to indicate that
> there should be no break at this point) should use U+2060 (WORD JOINER).
> 
> Regards,
> Martin

I'm out of date, then. Thanks for the link.

Steve



More information about the Python-list mailing list