[issue35549] Add partial_match: bool = False argument to unicodedata.lookup

Steven D'Aprano report at bugs.python.org
Fri Dec 21 19:37:08 EST 2018


Steven D'Aprano <steve+python at pearwood.info> added the comment:

I love the idea, but dislike the proposed interface.

As a general rule of thumb, Guido dislikes "constant bool parameters", where you pass a literal True or False to a parameter to a function to change its behaviour. Obviously this is not a hard rule, there are functions in the stdlib that do this, but like Guido I think we should avoid them in general.

Instead, I think we should allow the name to include globbing symbols * ? etc. (I think full blown re syntax is overkill.) I have an implementation which I use:

lookup(name) -> single character # the current behaviour

lookup(name_with_glob_symbols) -> list of characters

For example lookup('latin * Z') returns:

['LATIN CAPITAL LETTER Z', 'LATIN SMALL LETTER Z', 'LATIN CAPITAL LETTER D WITH SMALL LETTER Z', 'LATIN LETTER SMALL CAPITAL Z', 'LATIN CAPITAL LETTER VISIGOTHIC Z', 'LATIN SMALL LETTER VISIGOTHIC Z']


A straight substring match takes at worst twelve extra characters:

lookup('*' + name + '*')

and only two if the name is a literal:

lookup('*spam*')

This is less than `partial_match=True` (18 characters) and more flexible and powerful. There's no ambiguity between the two styles of call because the globbing symbols * ? and [] are never legal in Unicode names. See section 4.8 of

http://www.unicode.org/versions/Unicode11.0.0/ch04.pdf

----------
nosy: +steven.daprano

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35549>
_______________________________________


More information about the Python-bugs-list mailing list