[beginner] What's wrong?

Dan Sommers dan at tombstonezero.net
Sun Apr 3 14:26:48 EDT 2016


On Sun, 03 Apr 2016 10:18:45 -0700, Rustom Mody wrote:

> On Sunday, April 3, 2016 at 9:56:24 PM UTC+5:30, Dan Sommers wrote:
>> On Sun, 03 Apr 2016 08:39:02 -0700, Rustom Mody wrote:
>> 
>> > On Sunday, April 3, 2016 at 8:58:59 PM UTC+5:30, Dan Sommers wrote:
>> >> On Sun, 03 Apr 2016 07:30:47 -0700, Rustom Mody wrote:
>> >> 
>> >> > So here are some examples to illustrate what I am saying:
>> >> 
>> >> [A vs a, A vs A, flag vs flag, etc.]
>> > <snip>
>> >> I understand that in some use cases, flag and flag represent the same
>> >> English word, but please don't extend that to identifiers in my
>> >> software.
>> 
>> > I wonder once again if you are getting my point opposite to the one I
>> > am making.  With ASCII there were problems like O vs 0 -- niggling but
>> > small.
>> > 
>> > With Unicode its a gigantic pandora box.  Python by allowing unicode
>> > identifiers without restraint has made grief for unsuspecting
>> > programmers.
>> 
>> What about the A vs a case, which comes up even with ASCII-only
>> characters?  If those are the same, then I, as a reader of Python code,
>> have to understand all the rules about ß (which I think have changed
>> over time), and potentially þ and others.
> 
> Dont get your point.
> If you know German then these rules should be clear enough to you
> If not youve probably got bigger problems reading that code anyway

My point is that case sensitivity is good.  I was disagreeing with your
point about scheme getting A vs a "right" and Python and C and Unix
getting it "wrong."

My larger point, and my experience, is that case sensitivity is easier
for to handle than case insensitivity.  Most of the time, the same
letter's capital and small renditions look different from each other (A
vs a, Q vs q, and even Þ and þ is no worse than O and o), and there are
no context sensitive conversion rules to worry about.

> As illustration, here is Marko's code few posts back:
> 
> for oppilas in luokka:
>         if oppilas.hylätty():
>             oppilas.ilmoita(oppilas.koetulokset) 
> 
> Does it make sense to you?

It makes enough sense to recognize the idiom:  for each item in a
collection that satisfies a predicate, call a method on the item.

My point here is that while the identifiers themselves can be enormously
helpful to someone seeing a block of code for the first time or
maintaining it five years later, it's just as important to recognize
quickly that one identifier is not the same as another one, or that a
particular identifier only appears once or only in certain syntactical
constructs.

If the above code were written a little differently, we'd be having a
completely different discussion:

    for list in object:
        if list.clear():
            list.pop(list.append)

>> Can we take a "we're all adults here" approach?
> 
> Who's the 'we' we are talking about?

The community, who has accepted Python as a case-sensitive language and
knows better than to use identifiers that look too much alike or are
otherwise deliberatly mis-leading.

>> For the same reason
>> that adults don't use identifiers like xl0, x10, xlO, and xl0 anywhere
>> near each other, shouldn't we also not use A and A anywhere near each
>> other?  I certainly don't want the language itself to [try to] reject
>> x10 and xIO because they look too much alike in many fonts.
> 
> When Kernighan and Ritchie wrote C there was no problem with gets.
> Then suddenly, decades later the problem exploded.

When Kernighan and Ritchie wrote C there *was* a problem with gets.

> What happened?

The problem was no longer isolated to taking down one Unix process or a
single machine, or discovering passwords on that one machine.

> Here's an analysis:
> Security means two almost completely unrelated concepts
> - protection against shooting oneself in the foot (remember the 'protected' 
>    keyword of C++ ?)
> - protection against intelligent, capable, motivated criminals
> Lets call them security-s (against stupidity) and security-c (against criminals)
> 
> Security-c didnt figure because computers were anyway physically secured and 
> there was no much internet to speak of.
> gets was provided exactly on your principle of 'consenting-adults' -- if you
> use it you know what you are using.
> 
> Then suddenly computers became net-facing and their servers could be
> written by 'consenting' (to whom?) adults using gets.
> 
> Voila -- Security has just become a lucrative profession!

I can't prevent insecure web servers, or unknowing users.  Allowing or
disallowing A and A and А to coexist in the source code doesn't matter.

> I believe python's situation of laissez-faire unicode is similarly
> trouble-inviting.

I'm not sure I agree, but I didn't timing attacks on cryptographic
algorithms or devices reading passwords from air-gapped computers
coming, either.

I do know that complexity is also a source of bugs and security risks.
Allowing or disallowing certain unicode code points in identifiers, and
declaring that identifiers consisting of the same sequence of code
points are the same, is way less complex than getting something else
(even something as "simple" as case-insensitivity) right for all cases.

> While I personally dont know enough about security to be able to demonstrate a
> full sequence of events, here's a little fun I had with Chris:
> 
> https://mail.python.org/pipermail/python-list/2014-May/672413.html
> 
> Do you not think this could be tailored into something more sinister and
> dangerous?



More information about the Python-list mailing list