Confusing textwrap parameters, and request for RE help

Peter J. Holzer hjp-python at hjp.at
Wed Mar 25 15:33:25 EDT 2020


On 2020-03-23 06:00:41 +1100, Chris Angelico wrote:
> Second point, and related to the above. The regex that defines break
> points, as found in the source code, is:
> 
> wordsep_re = re.compile(r'''
>         ( # any whitespace
>           %(ws)s+
>         | # em-dash between words
>           (?<=%(wp)s) -{2,} (?=\w)
>         | # word, possibly hyphenated
>           %(nws)s+? (?:
>             # hyphenated word
>               -(?: (?<=%(lt)s{2}-) | (?<=%(lt)s-%(lt)s-))
>               (?= %(lt)s -? %(lt)s)
>             | # end of word
>               (?=%(ws)s|\Z)
>             | # em-dash
>               (?<=%(wp)s) (?=-{2,}\w)
>             )
>         )''' % {'wp': word_punct, 'lt': letter,
>                 'ws': whitespace, 'nws': nowhitespace},
> 
> It's built primarily out of small matches with long assertions, eg
> "match a hyphen, as long as it's preceded by two letters or a letter
> and a hyphen".

Do you need that fancy logic? Could you only break on white-space
instead? It won't wrap "tetrabromo-phenolsulfonephthalein" in that case
but since you mentioned its for a twitter client, most users probably
won't mind (and those who do mind will probably insist that the
algorithm should be able to split it into tetrabromo-phenolsulfone-
phthalein, if that's where the line end is, as it was here purely by
lucky accident). A regexp for whitespace is pretty simple.

        hp

-- 
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp at hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20200325/a3fc4974/attachment.sig>


More information about the Python-list mailing list