Python for large projects?

Darrell news at dorb.com
Sat Jun 26 16:52:44 EDT 1999


Python is great fun, with positive qualities that go on and on.

Large projects come with special problems of their own.
For C++ they have books about this subject.
Large-Scale C++ Software Design by John Lakos
Effective C++ by Scott Meyers.

I can recommend Python for large projects but watch out for the following.
IMHO.

1. Memory management. Search for keywords GC, cycles, reference counting in
this news group.
One problem we had is reading a very large file in and passing the buffer
down to be ripped apart. We got stuck holding the original buffer for this
file until the program exited. At least I think we did, it's hard to know
sometimes. This was a problem because as we ripped it apart our memory needs
grew very fast. We should have kept track of this from the start, it was a
pain to fix later.

This isn't a memory cycle but a problem still.
Don't:
    buf=open(file).read()
    doTheBigJob(buf)
        # buf won't be freed until this function exits. Which might be the
entire run of the program.
        # The idea is don't leave a reference at the top level
        # Even worse is if at a lower level you want to change buf.  Try
this and monitor memory usage.
>>> s="        "*999999
>>> b=s
>>> b=b+" "

Do something like this:
   parm={}
   parm['buf']=open(file).read()
   doTheBigJob(parm)
        # Now doTheBigJob can "del parm['buf']" to free the memory.
        # Or pass the buffer with in an object of some kind.

Look out for:
    dict1={}
    dict1['dict']=dict1
    # This and variations on this theme become immortal. Unless you kill it
off carefully.

Useful memory leak finding tool. But it won't find the above example unless
dict1 is in an object.
http://www.stud.ifi.uio.no/~larsga/download/python/plumbo.py

2. To use packages or not.
    We had multiple directories for the different components of our system.
So we had to setup sys.path and or use packages.  I like packages, but get
everyone to accept this up front. We fought about it because no one wanted
to deal with it. Using just the python path is simple but sooner or later
you end up with a module name the same as another on your system. The user
calls up describing a screen full of traceback that makes no sense because
you've just grabbed someone else's config.py file.

3. Decide upon exception handling strategy up front.
Don't mindlessly catch all errors, unless it's your last line of defense:
    try:
        xxx()
    except:
        yyyy
This hides errors of all kinds under a single description or handler.
Check out http://lwn.net/1999/0610/devel.phtml "The Python Way" by Tim
Peters

Do something like this:
import projectExceptions
    try:
        xxx()
    except projectExceptions.category1 , msg:
              # Our users don't want to see huge tracebacks.
          if debug:
               traceback.print_exc()
               log(sys.exc_type, msg)
          else:
               log(msg)
    except:
            # Unknown error
          if debug:
               log(sys.exc_type)
          else:
               log(projectExceptions.defaultMsg())

4. How should you handle internationalization.
    I don't know.

5. Python doesn't care about types.
    Good and bad this. Good you pass me an object and all I care about is
that it has a write method. So much nicer than C++.

    Bad, you pass me an object with out a write() and it's not detected
until runtime.
    I'm not sure but I think freeze can detect this. But don't write a boat
load of code then try a tool like freeze on it. You'll probably do something
it can't handle.

6. No const or private.
    You have to trust everyone.

Had enough ?
Do you think I have some opinions on this ?
--
--Darrell






More information about the Python-list mailing list