Short, perfect program to read sentences of webpage

Cameron Simpson cs at cskk.id.au
Wed Dec 8 21:58:15 EST 2021


On 08Dec2021 23:17, Stefan Ram <ram at zedat.fu-berlin.de> wrote:
>  Regexps might have their disadvantages, but when I use them,
>  it is clearer for me to do all the matching with regexps
>  instead of mixing them with Python calls like str.isupper.
>  Therefore, it is helpful for me to have a regexp to match
>  upper and lower case characters separately. Some regexp
>  dialects support "\p{Lu}" and "\p{Ll}" for this.

Aye. I went looking for that in the Python re module docs and could not 
find them. So the comprimise is match any word, then test the word with 
isupper() (or whatever is appropriate).

>  I have not yet incorporated (all) your advice into my code,
>  but I came to the conclusion myself that the repetition of
>  long sequences like r"A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝ" and
>  not using f strings to insert other strings was especially
>  ugly.

The tricky bit with f-strings and regexps is that \w{3,5} means from 3 
through 5 "word characters". So if you've got those in an f-string 
you're off to double-the-brackets land, a bit like double backslash land 
and non-raw-strings.

Otherwise, yes f-strings are a nice way to compose things.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-list mailing list