source code size metric: Python and modern C++

John Hunter jdhunter at ace.bsd.uchicago.edu
Tue Dec 3 01:19:13 EST 2002


>>>>> "Pavel" == Pavel Vozenilek <pavel_vozenilek at yahoo.co.uk> writes:

    >> I'll try to summarize the code size differences I see (I know
    >> C++ and am learning Python so take it as naive attempt):

I coded almost exclusively in C++ before I discovered python, and
boost is indeed a life saver.  For me, however, the difference in
coding efficiency is so extreme that it is hard to know where to
start.  I think one fruitful approach is to take best-of-practice C++
examples and rewrite them in python for comparison.  And if you post
your experiments here, I bet you'll get some helpful pointers.

The boost example code is a good place to start.  For example, here is
the url extractor from the boost::regex example code recoded in
python, followed below by the boost code

import re, sys

rgx = re.compile('<\s*A\s+[^>]*href\s*=\s*"([^"]*)"',
                 re.IGNORECASE|re.MULTILINE)

for file in sys.argv[1:]:
    s = open(file).read()
    print rgx.findall(s)

Even if you strip the comments and the redundant second method in the
boost example below, you have a substantial savings in python (even
the regex is simpler since we don't have to quote ").  Of course, you
may have substantial performance gains in boost, as Andrew Koenig
pointed out in response to your original post, so choose your poison.

To me, the "coding efficiency" is best quantified not by counting
lines, but by examining the complexity of the code.  I could have
shortened the python code above if line counting was my objective.  If
you strip the fluff from the C++ code, and forgive the load_file
function since it can easily be rolled into a library, you're still
looking at approx 30 C++ lines versus 6 or so python lines.  But I
think that this 5 fold savings is fairly minuscule compared to the
savings in code complexity.

It reminds me of one of my favorite quotes (paraphrased) "Be sure to
comment your code.  Someone oneday may have to read it, and that
someone may be you!".  Can't remember who said it or what the original
cite is, but if you know, please enlighten me.

Like many who came before me, I've become convinced that the way to go
is to code as much as possible in python first.  Then profile and
recode the bottlenecks in C++, or whichever compiled language suits
you.  With boost::python, it's pretty easy to use the 2 languages
together seamlessly.

John Hunter

boost:regex example code

/*
 *
 * Copyright (c) 1998-2002
 * Dr John Maddock
 *
 * Permission to use, copy, modify, distribute and sell this software
 * and its documentation for any purpose is hereby granted without fee,
 * provided that the above copyright notice appear in all copies and
 * that both that copyright notice and this permission notice appear
 * in supporting documentation.  Dr John Maddock makes no representations
 * about the suitability of this software for any purpose.
 * It is provided "as is" without express or implied warranty.
 *
 */

 /*
  *   LOCATION:    see http://www.boost.org for most recent version.
  *   FILE         regex_split_example_2.cpp
  *   VERSION      see <boost/version.hpp>
  *   DESCRIPTION: regex_split example: spit out linked URL's.
  */


#include <list>
#include <fstream>
#include <iostream>
#include <iterator>
#include <boost/regex.hpp>

boost::regex e("<\\s*A\\s+[^>]*href\\s*=\\s*\"([^\"]*)\"",
               boost::regbase::normal | boost::regbase::icase);

void load_file(std::string& s, std::istream& is)
{
   s.erase();
   //
   // attempt to grow string buffer to match file size,
   // this doesn't always work...
   s.reserve(is.rdbuf()->in_avail());
   char c;
   while(is.get(c))
   {
      // use logarithmic growth stategy, in case
      // in_avail (above) returned zero:
      if(s.capacity() == s.size())
         s.reserve(s.capacity() * 3);
      s.append(1, c);
   }
}

int main(int argc, char** argv)
{
   std::string s;
   std::list<std::string> l;
   int i;
   for(i = 1; i < argc; ++i)
   {
      std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
      s.erase();
      std::ifstream is(argv[i]);
      load_file(s, is);
      boost::regex_split(std::back_inserter(l), s, e);
      while(l.size())
      {
         s = *(l.begin());
         l.pop_front();
         std::cout << s << std::endl;
      }
   }
   //
   // alternative method:
   // split one match at a time and output direct to
   // cout via ostream_iterator<std::string>....
   //
   for(i = 1; i < argc; ++i)
   {
      std::cout << "Findings URL's in " << argv[i] << ":" << std::endl;
      s.erase();
      std::ifstream is(argv[i]);
      load_file(s, is);
      while(boost::regex_split(std::ostream_iterator<std::string>(std::cout), s, e, boost::match_default, 1)) std::cout << std::endl;
   }

   return 0;
}







More information about the Python-list mailing list