Python NBSP DWIM

Chris Angelico rosuav at gmail.com
Wed Jun 10 21:09:30 EDT 2015


On Thu, Jun 11, 2015 at 11:02 AM, <random832 at fastmail.us> wrote:
>
> On Wed, Jun 10, 2015, at 20:09, Chris Angelico wrote:
> > And U+FEFF "ZERO WIDTH NO-BREAK SPACE", notable because it's also used as
> > the byte-order mark (as its counterpart, U+FFFE, is unallocated). I've
> > been
> > fighting with VLC Media Player over the font it uses for subtitles; for
> > some bizarre reason, that font represents U+FEFF not with zero pixels of
> > emptiness, but with a box containing the letters "ZWN" "BSP" on two
> > lines.
> > Yeah, because that totally takes up zero width and looks like blank
> > space.
>
> As I understand it, the proper behavior is that the ZWNBSP that is the
> byte order mark shall never appear in an in-memory representation of the
> first line of a BOM-encoded file, or any other line of the concatenation
> of two BOM-encoded files, but should "vanish" when the file is opened
> and first read from. So it shouldn't be showing up in your subtitles
> regardless of its rendering behavior.

It's a perfectly valid character for other purposes; it's coming up in
the middle of pieces of text, which should be 100% legal. No, it's a
font problem.

ChrisA



More information about the Python-list mailing list