Grapheme clusters, a.k.a.real characters

Mikhail V mikhailwas at gmail.com
Tue Jul 18 19:43:26 EDT 2017


Marko Rauhamaa wrote:

>What did you think of my concrete examples, then? (Say, finding
>"Alvárez" with the regular expression "Alv[aá]rez".)

I think that should match both "Alvarez" and "Alvárez" ...?
But firstly, I feel like I need to _guess_ what ideas you
are presenting. Unless I open up Vim and apply my imagination,
it is hard even to get involved in your ideas.
I wonder why it is hard to elaborate a pair
of examples like e.g. :
- now the task A (concrete task defined) is solved with the code C1
- with the new syntax/method, the same task could be solved with the code C2

Just trying to guess related tasks:
For the automation of regex search-related tasks I would make a function
which generates the RE pattern first, i.e. define tables with
"variations" for glyphs, e.g. groups={"a": "aá"} or similar.
Then I'll need some micro-syntax for the conversion,
e.g. generate_re("Alv{a}rez", groups)

Intuitively, I suppose the groupings and even
the functions hardly can be standardized in a nice manner,
since I'll need to define and redefine them all the time for various cases.
But probably there can be some generality, hard to say.

What I need often is the "approximate" search function,
which returns a match "similar" to the input string. But I think even
the regex module
cannot fully solve this and I would end up with a function
which goes through each string element and calculate various
similarity criteria.


Mikhail



More information about the Python-list mailing list