Graduate thesis on Python-related subject

Grant Griffin not.this at seebelow.org
Wed Apr 18 00:32:28 EDT 2001


Jarno J Virtanen wrote:
> 
> I'm planning to do my graduate thesis (pro gradu) for MsC on (yet
> unconfirmed and still vague) subject tightly related to Python.  In
> short, the main goal is to test hypothesis that a program (or some other
> piece of code) written in a very high level language or scripting
> language (specifically Python in my study) is 3-10 times shorter
> measured in lines of code than a program written in traditional system
> programming language (eg.  C/C++/Java, don't know yet).

I don't know what factor you'll end up with, but I bet Python's lack of
braces will help by _at least_ 10%.

> I intend to
> concentrate on program length and analyse it not only by the number of
> lines but rather to specify what _is_ a line of code and so forth
> (remember, this is still just an idea :-).

I think this has a lot of potential--at least in terms of filling up a
thesis <wink>. One could agonize endlessly about whether block
delimiters (in all languages besides Python) count as "lines of code". 
In one sense, they don't: they're pure boiler-plate.  But in another
sense they do: heck, _somebody's_ gotta type 'em!

More interesting, maybe, (and much harder to analyze) is the
"productivity" impact of using indents for block delimiting.  Most folks
who like the idea (meaning: those of us here) would probably say that
it's not just the lines of code saved (if any <wink>), but also that
there is additional readability/maintainability due to having less
clutter on the screen, and more good stuff fitting on the screen (due to
less clutter).  Oh, and it leads to more uniform code--that's gotta be
worth something.  Not to mention fewer dead trees when you print it out.

> Important part of the
> forthcoming research would be to study measuring overall (why, what,
> how) and to speculate on scripting languages in general.

IMHO, one of the most powerful things about scripting languages really
isn't something that _non-scripting_ languages can't do: scripting
languages have powerful, application-oriented, partly user-contributed
libraries.  This is where much of the code density comes from: from not
reinventing the wheel.  For example, I can't even _begin_ to tell you
how many times I haven't reinvented Python's "glob" module.

Conceptually, there's no reason that most of Python's libraries couldn't
be written in C++.  But they just aren't.  Sure there are plenty of
libraries for C++, but they suffer from a lack of standardization.  (In
fact, it's a cruel irony that the main problem with the C++ Standard
Template Libary is that it is Far From.)

In comparison, Python, Perl, et. al. have managed to create and maintain
standard libraries.  This is partly a socialogical phenonenom, based in
The Miracle of Open Source Software: since Python is free, open, and
widely ported, there is no incentive to create non-standardness.  And
Python has a gifted and dedicated BDFL to design (or at least shepard)
its libraries.

So I guess I'm making your job harder if you buy my story that a large
part of what makes scripting language code dense isn't really based in
the fact that they're scripting languages.  Or maybe I've just given you
more grist for the agonizing mill <wink>.

> I know that
> the main subject (measuring Python programs) would be kind of "stating
> the obvious" and I know the work done by Lutz Prechelt ("An Empirical
> Comparison of C, C++, Java, Perl, Python, Rexx, and Tcl for a
> Search/String-Processing Program") which covers also the program length,
> but my idea (besides the fact that my study would be an undergraduate
> research, so no new scientific results is required) is that hopefully
> this kind of work could be ("sort of") referenced and "the obvious"
> would be somewhere stated explicitely.  Also my intention would be to
> study (possible) special features of Python.

One of the most obvious "differences" of scripting languages is dynamic
typing.  But I'm not convinced that the dynamic typing feature of
Python, Perl, and other scripting languages is as big of a thing as it's
made out to be.  But what _is_ a very significant advantage is strings
and dictionaries/hashes as built-in data types.  Again, one could do
that in C++ (and plenty of folks have--non-standardly), but since those
types are not "built-in" language features, they don't get used much.

If you look at all the things you do with dictionaries/hashes in
Python/Perl that you do some other way in C/C++, you'll find a
significant difference in code density.  Still, I've never used them in
C/C++.  And, interestingly, I've never even missed them.  I think it
might be because I do stuff in Python (and previously Perl) which just
weren't worth the time it would take in C/C++.  Again, here's more to
agonize about: it's not just about lines of code, it's about opening up
the develpment of whole new kinds of application by making those
applications simple enough to be feasible.

> Now (after the short
> introduction :-), my question would be:
> 
> Is there "official" (or other) interest in such study?

Nobody's in charge here.  So, speaking for the group, I can only offer
unofficial interest.

> The reason for me to ask this is that in Helsinki University we are
> supposed to write graduate thesis in one's native language, but it is
> possible to write it in english.  Now, I'm still not sure whether I
> should write it in english.  If there would be some real interest for
> this kind of study, I could very well write it in english.  (If I would
> get e-mail from the BDFL, I wouldn't have to hesitate at all ;-).
> Related, one has to do an explicit application to be allowed to write
> the thesis in english and a short motivation from someone other than me
> wouldn't hurt.  :-)
> 
> So, arguments for writing in english:
> 
> 1) The work could possibly be beneficial for the Python community.
>    This would give me extra motivation on improving the quality,
>    if and when my overall motivation decreases at some stage.

Yes, I, for one would be interested.

> 2) I would learn to write better english (ie. "scientific" english).

I'd start by capitalizing the "e" on "English". <wink>  (Sorry--other
than that, you already do very well!)

> Arguments against:
> 
> 1) The overall required effort for me would increase.

We don't mind.

> 2) I would not learn to write better finnish. :-)

Whatever you decide, you'd better finish.

> Waiting for comments, Jarno.

never-short-of-inane-ones-ly y'rs,

=g2
-- 
_____________________________________________________________________

Grant R. Griffin                                       g2 at dspguru.com
Publisher of dspGuru                           http://www.dspguru.com
Iowegian International Corporation            http://www.iowegian.com



More information about the Python-list mailing list