Using "textwrap" package for unwrappable languages (Japanese)

Wed Aug 30 13:52:50 EDT 2023

On 2023-08-30 13:18:25 +0000, c.buhtz--- via Python-list wrote:
> Am 30.08.2023 14:07 schrieb Peter J. Holzer via Python-list:
> > another caveat: Japanese characters are usually double-width. So
> > (unless your line length is 130 characters for English) you would
> > want to add that line break every 32 characters.
> 
> I don't get your calculation here. Original line length is 130 but for
> "double-with" characters you would break at 32 instead of 65 ?

No, I wrote "*unless* your original line length was 130 characters".

I assumed that you want your line to be 65 latin characters wide since
this is what fits nicely on an A4 (or letter) page with a bit of a
margin on both sides. Or on an 80 character terminal screen or window.
And it's also generally considered to be a good line length for
readability.

But Asian "full width" or "wide" characters are twice as wide, so you
can fit only half as many in a single line. Hence 65 // 2 = 32.

But that was only my assumption. I considered it possible that you
started with 130 characters per line (many terminals back in the day had
a 132 character mode, and that's also approximately the line length in
landscape mode or when using a compressed typeface - so 132 is also a
common length limit, although rarely for text (too wide to read
comfortably) and more for code, tables, etc.), divided that by two and
arrived at 65 Japanese characters per line that way. So I mentioned that
to indicate that I had considered the possibility but concluded that it
probably wasn't what you meant.

(And as usual when I write a short sentence to clarify something
I wind up writing 4 paragraphs clarifying the clarification :-/)

> Then I will do something like this
> 
>     unicodedata.east_asian_width(mystring[0])
> 
> W is "wide". But there is also "F" (full-width).
> What is the difference between "wide" and "full-width"?

I'm not an expert on Japanese typography by any means. But they have
some full width variants of latin characters and halfwidth variants of
katakana characters. I assume that the categories 'F' and 'H' are for
those, while "normal" Japanese characters are "W":

>>> unicodedata.east_asian_width("\N{DIGIT ONE}")
'Na'
>>> unicodedata.east_asian_width("\N{FULLWIDTH DIGIT ONE}")
'F'
>>> unicodedata.east_asian_width("\N{KATAKANA LETTER ME}")
'W'
>>> unicodedata.east_asian_width("\N{HALFWIDTH KATAKANA LETTER ME}")
'H'

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://mail.python.org/pipermail/python-list/attachments/20230830/865d90c8/attachment.sig>