Python NBSP DWIM

random832 at fastmail.us random832 at fastmail.us
Wed Jun 10 11:57:37 EDT 2015


On Wed, Jun 10, 2015, at 11:03, Laura Creighton wrote:
> In these unicode days, this thinking may need to be revisited.  There
> are many languages where whitespace does not separate words -- either
> words aren't separated, or in Vietnamese, spaces separate syllables,
> so entire words have spaces in them.

Text wrapping for CJK scripts is another topic that might be worth
addressing in textwrap - words aren't space-separated, but there are
still rules about where you can place a line break. Generally these are
centered around preventing punctuation marks from being orphaned rather
than any attempt to algorithmically find word boundaries.

For the process called "Oikomi", while messing with kerning is not
strictly possible for monospaced text, it might be worthwhile in general
to have "preferred" and "maximum" line widths as parameters for
textwrap.

http://en.wikipedia.org/wiki/Line_breaking_rules_in_East_Asian_languages



More information about the Python-list mailing list