Language change and code breaks
Skip Montanaro
skip at pobox.com
Fri Jul 20 13:54:41 EDT 2001
Some observations and data from someone who has yet to participate in this
discussion.
I have seen at least one argument that interfaces to external packages it is
useful to be able to map more easily onto the identifiers in external
systems. Most of those external systems tend to be case-sensitive, because
they tend to be written in C or C++. What effect would case-insensitivity
in the language have on packages like Boost, SWIG or Jython? I imagine that
at the least, they would have to develop some sort of name mapping. One of
the advantages of automatic wrapper generators is that it preserves (most
of) the value of existing documentation. Case-insensitivity would probably
hurt somewhat there.
I don't think you can add a switch to the language to make it
case-insensitive on-the-fly by the user. Users would not be able to
reliably use external packages that had been coded expecting to only be used
in one environment or the other. For the sanity of all involved, I think
the language is either going to have to be strictly case-sensitive or
strictly case-insensitive. A schizophrenic interpreter would probably only
induce psychoses in its users. ;-)
I wrote a simple script using the tokenize module to classify the names in
the Python sources in the current distribution (cvs up'd from the
descr-branch this morning) and spit out potential name clashes if those
names were compared in a case-insensitive fashion. I ran the script from my
build directory under .../dist/src as:
find .. -name '*.py' | xargs ./python ~/tmp/spittokens.py
The output looks like:
../Demo/classes/Rat.py
rat,Rat
...
../Lib/distutils/sysconfig.py
PREFIX,prefix
EXEC_PREFIX,exec_prefix
TextFile,text_file
The script considers all names that don't begin with an underscore, however,
for all other names it elides underscores before classifying them. For
example "my_dog", "mydog", "MyDog", and "my_dog__" would all be classified
the same. My assumption was that in the absence of capitalization as a way
to distinguish names, underscores would be used more than they are today and
so "my_dog" and "MyDog" should be classified the same, because the most
likely way to prevent "mydog" and "MyDog" from clashing in a
case-insensitive world would be to rewrite the former as "my_dog". This
obviously errs on the high side when considering what might need to be done
to the Python core libraries. It also completely ignores the core C source
code, the names of the source files themselves, and potential name clashes
across module boundaries (all sources of plenty of potentially clashing
identifiers).
With those caveats, it identified 504 Python source files (out of a total of
1335 .py files) in the current distribution that might have name conflicts
(or would at least have to be checked) if Python became case-insensitive.
The directories with the most files that would have to be checked are
64 ../Lib
34 ../Lib/test
32 ../Demo/tkinter/matt
25 ../Tools/idle
21 ../Mac/Tools/IDE
21 ../Demo/tkinter/guido
21 ../Demo/sgi/video
15 ../Mac/Lib
14 ../Lib/distutils
13 ../Mac/Lib/test
12 ../Lib/lib-tk
10 ../Tools/scripts
10 ../Tools/pynche
10 ../Mac/scripts
The source and the output are available at
http://musi-cal.mojam.com/~skip/spittokens.py
http://musi-cal.mojam.com/~skip/spittokens.out
respectively.
Finally, I ran the script over my personal library of Python source files.
It identified 75 files out of 360 with potential name conflicts. I suspect
if the language is changed I will adapt without much difficulty, but it will
be tedious.
--
Skip Montanaro (skip at pobox.com)
http://www.mojam.com/
http://www.musi-cal.com/
More information about the Python-list
mailing list