Late-binding of function defaults (was Re: What is a function parameter =[] for?)

Chris Angelico rosuav at gmail.com
Thu Nov 26 18:07:34 EST 2015


On Fri, Nov 27, 2015 at 9:27 AM, BartC <bc at freeuk.com> wrote:
> On 26/11/2015 13:15, Chris Angelico wrote:
>>
>> On Thu, Nov 26, 2015 at 11:53 PM, BartC <bc at freeuk.com> wrote:
>
>
>>> http://pastebin.com/JrVTher6
>
>
>> #14 and #15: Are you assuming that a character is a byte and that
>> diacritical-free English is the only language in the world?
>
>
> I don't think that need be the assumption. Any UTF8 string that fits within
> 8 bytes could also be represented by an integer value.

Okay, so you're making UTF-8 your visible string representation.
That's better than assuming character==byte, but it still has the case
insensitivity problem.

>> Case
>> insensitivity is a *pain* when you try to be language-agnostic; for
>> instance, the case-folding rules of English state that U+0069 LATIN
>> SMALL LETTER I and U+0049 LATIN CAPITAL LETTER I are identical, but
>> Turkish would upper-case the first to U+0130 LATIN CAPITAL LETTER I
>> WITH DOT ABOVE and lower-case the second to U+0131 LATIN SMALL LETTER
>> DOTLESS I. German has U+00DF LATIN SMALL LETTER SHARP S (also called
>> eszett), which traditionally upper-cases to "SS", which lower-cases to
>> "ss".
>
>
> I use Windows which is also case insensitive with regard to filenames and
> such. How does it solve those problems? How about web-site names, email
> addresses and Google searches?

Windows: I'm not sure, and frankly, I don't trust it. A quick test
showed a couple of failures:

C:\Users\Rosuav\Desktop>dir /b TE*
teßting
C:\Users\Rosuav\Desktop>dir /b TESST*
File Not Found

C:\Users\Rosuav\Desktop>dir /b ParıldıYOR*
Parıldıyor Parts & Pieces
C:\Users\Rosuav\Desktop>dir /b PARILDIYOR*
File Not Found

It might be case insensitive only for ASCII.

(Note: This test was done on Windows 7, because that's the VM I had
handy. Things might be different on newer Windowses, but I can't
check.

Web site names: Presumably you mean DNS. It started out as an
ASCII-only protocol, and grew a number of gross hacks to support
"internationalized domain names". I'm not sure where the case
insensitivity is applied; but it doesn't matter too much, because
conflicts can be resolved at registration. Also, you'll generally see
IDNs in country-specific TLDs, so there'll be only one language (or a
small family of languages) used, reducing the likelihood of
collisions.

Google searches are (deliberately) a LOT more sloppy than just case
sensitivity. You can search for something without diacriticals and get
back results with diacriticals; you can transpose letters, omit
letters, have extra letters, and it'll generally figure out what you
want. This is absolutely awesome for a search engine, but equally
horrifying for name lookups in a program.

None of these is something I'd recommend following.

> Within a program source code (where you have mainly technical users), you
> can just impose some restrictions on keywords and identifiers otherwise
> there are plenty of problems even without case switching, if you want to
> allow Unicode here.

I would strongly support ASCII-only *language keywords*. You don't
have many of them (compared to the number of identifiers in a
program), and everyone has to type them. But for identifiers, Python 3
defines character validity based on Unicode categories, and performs
NFKC normalization on all names. That's pretty straight-forward. No
case sensitivity hassles, no messy non-transitive equalities, it's
easy.

ChrisA



More information about the Python-list mailing list