Extend unicodedata with a name/pattern/regex search for character entity references?

Tue Sep 6 06:06:32 EDT 2016

Rustom Mody wrote:

> On Saturday, September 3, 2016 at 5:25:48 PM UTC+5:30, Veek. M wrote:
>> https://mail.python.org/pipermail//python-ideas/2014-October/029630.htm
>> 
>> Wanted to know if the above link idea, had been implemented and if
>> there's a module that accepts a pattern like 'cap' and give you all
>> the instances of unicode 'CAP' characters.
>>  ⋂ \bigcap
>>  ⊓ \sqcap
>>  ∩ \cap
>>  ♑ \capricornus
>>  ⪸ \succapprox
>>  ⪷ \precapprox
>> 
>> (above's from tex)
>> 
>> I found two useful modules in this regard: unicode_tex, unicodedata
>> but unicodedata is a builtin which does not do globs, regexs - so
>> it's kind of limiting in nature.
>> 
>> Would be nice if you could search html/xml character entity
>> references as well.
> 
> [Not exactly an answer]
> 
> I use a number of things for such
> 1. Google
> 2. Xah Lee’s excellent pages which often fit my brain better than
> wikipedia:
>    http://xahlee.info/comp/unicode_index.html
> 3. emacs’ function ucs-insert recently renamed to insert-char
>    ie [In emacs] Type Alt-x insert-char
>    After that some kind of TAB-globbing (case-insensitive) works
>    I wont try with Cap (because the number of *CAPITAL* is in
>    thousands!) eg alphaTAB gives nothing. However *alphaTAB gives a
>    bunch. Narrow to "greek alpha"TAB and you get a bunch
> 
> 
> The fact that we should have a series of levels for char-input from
> most general and unergonomic (google) to most specific and ergonomic
> (special purpose keyboard) Ive tried to talk of as 7 levels near end
> of http://blog.languager.org/2015/01/unicode-and-universe.html

got dengu - i'm dead