Extend unicodedata with a name/pattern/regex search for character entity references?

Rustom Mody rustompmody at gmail.com
Sun Sep 4 01:30:53 EDT 2016


On Saturday, September 3, 2016 at 5:25:48 PM UTC+5:30, Veek. M wrote:
> https://mail.python.org/pipermail//python-ideas/2014-October/029630.htm
> 
> Wanted to know if the above link idea, had been implemented and if 
> there's a module that accepts a pattern like 'cap' and give you all the 
> instances of unicode 'CAP' characters.
>  ⋂ \bigcap
>  ⊓ \sqcap
>  ∩ \cap
>  ♑ \capricornus
>  ⪸ \succapprox
>  ⪷ \precapprox
> 
> (above's from tex)
> 
> I found two useful modules in this regard: unicode_tex, unicodedata
> but unicodedata is a builtin which does not do globs, regexs - so it's 
> kind of limiting in nature.
> 
> Would be nice if you could search html/xml character entity references 
> as well.

[Not exactly an answer]

I use a number of things for such
1. Google
2. Xah Lee’s excellent pages which often fit my brain better than wikipedia:
   http://xahlee.info/comp/unicode_index.html
3. emacs’ function ucs-insert recently renamed to insert-char
   ie [In emacs] Type Alt-x insert-char
   After that some kind of TAB-globbing (case-insensitive) works
   I wont try with Cap (because the number of *CAPITAL* is in thousands!)
   eg alphaTAB gives nothing. However *alphaTAB gives a bunch.
   Narrow to "greek alpha"TAB and you get a bunch


The fact that we should have a series of levels for char-input from
most general and unergonomic (google) to most specific and ergonomic (special purpose keyboard) Ive tried to talk of as 7 levels near end of
http://blog.languager.org/2015/01/unicode-and-universe.html



More information about the Python-list mailing list