Late-binding of function defaults (was Re: What is a function parameter =[] for?)
Chris Angelico
rosuav at gmail.com
Thu Nov 26 18:07:34 EST 2015
On Fri, Nov 27, 2015 at 9:27 AM, BartC <bc at freeuk.com> wrote:
> On 26/11/2015 13:15, Chris Angelico wrote:
>>
>> On Thu, Nov 26, 2015 at 11:53 PM, BartC <bc at freeuk.com> wrote:
>
>
>>> http://pastebin.com/JrVTher6
>
>
>> #14 and #15: Are you assuming that a character is a byte and that
>> diacritical-free English is the only language in the world?
>
>
> I don't think that need be the assumption. Any UTF8 string that fits within
> 8 bytes could also be represented by an integer value.
Okay, so you're making UTF-8 your visible string representation.
That's better than assuming character==byte, but it still has the case
insensitivity problem.
>> Case
>> insensitivity is a *pain* when you try to be language-agnostic; for
>> instance, the case-folding rules of English state that U+0069 LATIN
>> SMALL LETTER I and U+0049 LATIN CAPITAL LETTER I are identical, but
>> Turkish would upper-case the first to U+0130 LATIN CAPITAL LETTER I
>> WITH DOT ABOVE and lower-case the second to U+0131 LATIN SMALL LETTER
>> DOTLESS I. German has U+00DF LATIN SMALL LETTER SHARP S (also called
>> eszett), which traditionally upper-cases to "SS", which lower-cases to
>> "ss".
>
>
> I use Windows which is also case insensitive with regard to filenames and
> such. How does it solve those problems? How about web-site names, email
> addresses and Google searches?
Windows: I'm not sure, and frankly, I don't trust it. A quick test
showed a couple of failures:
C:\Users\Rosuav\Desktop>dir /b TE*
teßting
C:\Users\Rosuav\Desktop>dir /b TESST*
File Not Found
C:\Users\Rosuav\Desktop>dir /b ParıldıYOR*
Parıldıyor Parts & Pieces
C:\Users\Rosuav\Desktop>dir /b PARILDIYOR*
File Not Found
It might be case insensitive only for ASCII.
(Note: This test was done on Windows 7, because that's the VM I had
handy. Things might be different on newer Windowses, but I can't
check.
Web site names: Presumably you mean DNS. It started out as an
ASCII-only protocol, and grew a number of gross hacks to support
"internationalized domain names". I'm not sure where the case
insensitivity is applied; but it doesn't matter too much, because
conflicts can be resolved at registration. Also, you'll generally see
IDNs in country-specific TLDs, so there'll be only one language (or a
small family of languages) used, reducing the likelihood of
collisions.
Google searches are (deliberately) a LOT more sloppy than just case
sensitivity. You can search for something without diacriticals and get
back results with diacriticals; you can transpose letters, omit
letters, have extra letters, and it'll generally figure out what you
want. This is absolutely awesome for a search engine, but equally
horrifying for name lookups in a program.
None of these is something I'd recommend following.
> Within a program source code (where you have mainly technical users), you
> can just impose some restrictions on keywords and identifiers otherwise
> there are plenty of problems even without case switching, if you want to
> allow Unicode here.
I would strongly support ASCII-only *language keywords*. You don't
have many of them (compared to the number of identifiers in a
program), and everyone has to type them. But for identifiers, Python 3
defines character validity based on Unicode categories, and performs
NFKC normalization on all names. That's pretty straight-forward. No
case sensitivity hassles, no messy non-transitive equalities, it's
easy.
ChrisA
More information about the Python-list
mailing list