Python development time is faster.

Mon Nov 13 10:20:51 EST 2006

Chris Brat wrote:
> I've seen a few posts, columns and articles which state that one of the
> advantages of Python is that code can be developed x times faster than
> languages such as <<Insert popular language name here>>.
> 
> Does anyone have any comments on that statement from personal
> experience?

I had to work at a laboratory a few years ago which used Java 
exclusively.  I was coming from several years as a graduate student 
using Python almost exclusively for my own work.  (But I used to teach 
introductory Java classes at my previous university, so I had plenty of 
Java experience.)

My own work and the work that I did for the lab were quite similar, 
mainly focused on training machine learning models on natural language 
processing tasks. I estimated that the Java code took me about 5x as 
long. Part of this is the verbosity of Java, e.g. where you have to 
write an anonymous inner class instead of using a function or a class 
object directly. But probably a larger part of this was using the Java 
libraries, which tend to be way over-engineered, and more complicated to 
use than they need to be.

A simple example from document indexing.  Using Java Lucene to index 
some documents, you'd write code something like::

     Analyzer analyzer = new StandardAnalyzer()
     IndexWriter writer = new IndexWriter(store_dir, analyzer, true)
     for (Value value: values) {
         Document document = Document()
         Field title = new Field("title", value.title,
                                 Field.Store.YES,
                                 Field.Index.TOKENIZED)
         Field text = new Field("text", value.text,
                                Field.Store.YES,
                                Field.Index.TOKENIZED)
         document.add(title)
         document.add(text)
     }

Why is this code so verbose?  Because the Lucene Java APIs don't like 
useful defaults. So for example, even though StandardAnalyzer is 
supposedly *Standard*, there's no IndexWriter constructor that includes 
it automatically. Similarly, if you create a Field with a string name 
and value (as above), you must specify both a Field.Store and a 
Field.Index - there's no way to let them default to something reasonable.

Compare this to Python code. Unfortunately, PyLucene wraps the Lucene 
APIs pretty directly, but I've wrapped PyLucene with my own wrapper that 
adds useful defaults (and takes advantages of things like Python's 
**kwargs).  Here's what the same code looks like with my Python wrapper 
to Lucene::

     writer = IndexWriter(store_dir)
     for value in values:
         document = Document(title=value.title, text=value.text)
         writer.addDocument(document)
     writer.close()

Gee, and I wonder why it took me so much longer to write things in Java. ;-)

STeVe