case-sensitivity (was Re: True, False, None)

Thu Nov 13 10:17:19 EST 2003

Michele Simionato wrote:

> Alex Martelli <aleax at aleax.it> wrote in message
> news:<yRssb.11182$9_.422629 at news1.tin.it>...
>> So, when I used to have a factory function (as 'int' was), and change
>> it into a type (or class, same thing), I should rename it and break all
   ...
> But these are very rare cases, so probably I could live with an
> enforced capitalization too.

You think it's rare, during refactoring, to change between types and
factory functions?!  I suspect you may not have been maintaining and
enhancing code (in languages allowing interchange of the two things)
for long enough.  Consider that *ALL* types in Python's builtins started
life as factory functions -- that's 100%... "very rare"?!-)

>> remember that module FCNTL has an all-uppercase name, htmllib all-lower,
>> cStringIO weirdly mixed, mimetypes lower, MimeWriter mixed, etc, etc --
>> totally wasted mnemonic effort.
> 
> This is more a problem of inconsistent conventions. It is true that
> it will disappear with case insensitivity enforced, but somewhat I

[[ probably mean "case sensivity enforced" ? ]]

> don't feel it a reason strong enough.

Ruby seems to survive with warnings for capitalization inconsistent
with its favourite conventions, but then it doesn't let you "just call
the class" to instantiate it -- you have to call Theclass.new(23) or
the like.  Ooops, was that TheClass.new(23) ...?  How is the assumed
"enforcement" going to deal with THESE issues?-)

>> E.g., would the letter 'f' in the word 'file' be uppercased or not
>> when it occurs within a composite word?  Take your pick...
>> shelve.DbfilenameShelf, zipfile.BadZipfile, zipfile.ZipFile,
>> mimify.HeaderFile, ...
> 
> You are exaggerating a bit. Yes, you are right in your complaints,
> but in my experience I never had headaches due to case sensitivity.

How am I "exaggerating" in claiming that the SAME module, zipfile,
spells "zipfile" differently in the module name itself, in class
zipfile.ZipFile, and in class zipfile.BadZipfile?  Maybe you have a
photographic memory so that having seen each of these ONCE you are
never going to forget which ones uppercase exactly which letters, but
even back when my memory was outstanding (when I was younger) it was
always more "auditory" than "visual": I could easily recite by heart
long quotes from books I had read once, but never could recall the 
details of punctuation (or capitalization, when non-systematic, as 
it often was e.g. in 17th/18th century english) without painstaking
and painful explicit memorization effort.

Should a language cater mostly to the "crowd" (?) of people with
photographic memories, or shouldn't it rather help the productivity
of people whose memories aren't _quite_ that good and/or visual...?

> You are still exaggerating. 99% of times uppercase constants denote
> numbers or strings. I don't remember having ever seen an all uppercase
> function, even if I am sure you do ;)

Maybe my defect is knowing the Python standard library too well?  It's
got SEVERAL all-uppercase functions, Michele!  Check out modules
difflib (functions IS_LINE_JUNK and IS_CHARACTER_JUNK), gzip 
(functions U32 and LOWU32), stat (all the S_IMODE etc functions),
token (functions ISTERMINAL and ISNONTERMINAL)...!

Maybe the sum total IS 1% or so of the functions in the library, but
that's _STILL_ a silly, arbitrary memorization chore which I shouldn't
have had to undergo in the first place -- and I'm not even sure I
have in fact remembered all of them...

>> I can rebind f.func_defaults, NOT f.func_name -- but they have exactly
>> the same capitalization.  So, no guidance here...
> 
> Yeah, having read-only attributes distinguished by some code convention
> (underscores or capitalization) would be a possibility, but what if in
> a future version of Python the read-only attribute becomes writable? I

Exactly: a silly distinction.  So, let's NOT draw it in the first place.

> am use you had this are argument in the back of you mind. Nevertheless,
> notice that this is an argument against a too strict code convention,
> not against case insensitivity.

You seem to argue that case sensitivity is good because it lets you
draw a distinction that, you've just shown, should NOT be drawn.  It
seems to me that this is rather one of the reasons that make it BAD:-).

>> > In a case insensitive language I am sure I would risk to override
>> > constants with functions.
>> 
>> Why is that any more likely than 'overriding' (e.g.) types (in the old
> 
> oops again, I meant "shadowing" instead of 'overriding' but you understood
> 
>> convention which makes them lowercase) or "variables" (names _meant_
>> to be re-bound, but not necessarily to functions)?  And if you ever use
>> one-letter names, is G a class/type, or is it a constant?
> 
> I often use lowercase for constants, for instance in mathematical
> formulas:
> 
> a=1
> b=2
> y=a*x+b
> 
> Readability counts more than foolish consistency. Please, remember that
> I am not advocating *enforcing* capitalization, even if I would welcome
> a more consistent usage of capitalization in the standard library.
> What I say is that case sensitivity is convenient, it gives me more
> indentifiers for free.

It's anything BUT "for free", as I've been arguing: we pay quite a
price for it.  Adding ONE character (a letter, assumed case-insensitive,
digit or underscore) gives you 37 times more identifiers; for identifiers
of length 6, drawing them from an alphabet of 63 rather than 37 chars
gives you just 24.37 times more identifiers -- and you aren't going to
use most of those extra possibilities unless you're TRYING to drive
readers of your code crazy, anyway.

> For instance, I can define a matrix type, overload "*" and write the
> multiplication of a matrix "A" times a vector "a" as
> 
> b=A*a
> 
> Much more readable to me, that something like

Please note that here you are suddenly and undocumentedly _switching
conventions on the fly_ regarding capitalization.  One moment ago,
leading cap had to mean "type" and all caps had to mean "numeric
constant" (which in turn made a single-caracter capital identifier
ambiguous already...) -- now suddenly neither of these conventions
exists any more, since that uppercase A means 'matrix' instead of
'vector' (and a _number_, i.e. even lower dimensionality, would be
indicated *HOW*?  Don't you EVER multiply your matrices by scalars?!
Or is it so crucial to distinguish matrices from vectors but totally
irrelevant to distinguish either from scalars?!).

> b=a_matrix*a
> 
> or even
> 
> b=c*a # c is a matrix
> 
> Mathematical formulas are the first reason why I like case sensitivity.

My opinion is that, while _habit_ in mathematical formulas may surely
make one hanker for such case-sensitivity, the preference just does not
stand up to critical analysis, as above.  You're trying to overload WAY
too many different and conflicting "conventions" onto a meager "one bit
[at most] per character" (about 0.87 bits I believe, in fact) of
"supplementary information" yielded by case-sensitivity.

>> I consider the desire to draw all of these distinctions by lexical
>> conventions quite close to the concepts of "hungarian notation", and
>> exactly as misguided as those.
> 
> <horrified> Hungarian notation is an abomination you cannot put on
> the same foot with case sensitivity! </horrified>

They're both (as generally used) attempts to draw type-distinctions
by means of lexical conventions.  Case sensitivity is worse because
it doesn't wreck its damage only at the START of an identifier -- it
percolates INSIDE the identifier, too, making it (absurdly) an error to
try and catch a "zipfile.BadZipFile" or instantiate a "zipfile.Zipfile".

> Case preservation is an interesting concept which would solve some
> of my objections against case insensitivity, but not all of them. We would
> need more letters!

Use 6 instead of 5 letters -- quite sufficient to give you PLENTY
more identifiers to choose from.

Most paladins of case sensitivity would probably be horrified to see
that the main point in its "favour" now appears to be that it
encourages you to use shorter (e.g. 1-letter) identifiers (and thus
more cryptic ones) because it gives you more of them to choose from...!!!

> For example, Scheme and Lisp are case insensitive, but they are richer
> than Python in the character set of their identifiers: so they can
> distinguish a class from an instance writing something like "c"
> for the instance and "c*" or "c#" or "c-" etc. for the class. How would
> you do that in Python? "c" for the class and "ac" for the instance? Not

I've tried (e.g. in Dylan) the concept of having punctuation freely
embeddable in identifiers and didn't particularly like it (I guess it
works better with a NON-infix-syntax language -- I don't recall it
feeling like a problem in either Forth or Scheme -- but in Dylan the
inability of writing a sum as
    a+b
because that's an identifier, so you have to write
    a + b
instead, _was_ rather uncomfortable to me [maybe I just didn't get
long-enough practice and experience with it]).

> readable at all, IMHO. And think again to mathematical formulae, they
> would really suffer from case insensitivity.

I disagree -- once you have to spell out e.g. pi, capital-sigma, etc,
in Ascii letters anyway, having to make sure you do so in letters that
are unambiguous in terms of capitalization differences is no big loss.
Personally, in terms of formulas, I've never found Fortran any less
readable than C, for example.

And no, I definitely don't want Unicode characters in identifiers --
that would ensure a LOT of new and diverse errors as people use the
wrong "decoration" (accent, circumflex, etc, etc) on letters.  Plain 
ascii's just great...!-)

Alex