NLTK and package structure

Thu Oct 27 21:21:10 EDT 2011

The Natural Language Toolkit (NLTK) is a suite of open source Python
packages for natural language processing, available at
http://nltk.org/, together with an O'Reilly book which is available
online for free.  Development is now hosted at http://github.com/nltk
-- get it here: git at github.com:nltk/nltk.git

I am seeking advice on how to speed up our import process.  The
contents of several sub-packages are made available at the top level,
for the convenience of programmers and so that the examples published
in the book are more concise.  This has been done by having lots of
"from subpackage import *" in the top-level __init__.py.  Some of
these imports are lazy.  Unfortunately, any import of nltk leads to
cascading imports which pull in most of the library, unacceptably
slowing down the load time.

https://github.com/nltk/nltk/blob/master/nltk/__init__.py

I am looking for a solution that meets the following requirements:
1) import nltk is as fast as possible
2) published code examples are not broken
    (or are easily fixed by calling nltk.load_subpackages() before the
rest of the code)
3) popular subpackage names are available at the top level
    (e.g. nltk.probability.ConditionalFreqDist as nltk.ConditionalFreqDist)

The existing discussion of this issue amongst our developers is posted here:
http://code.google.com/p/nltk/issues/detail?id=378

Our practice in structuring subpackages is described here:
http://code.google.com/p/nltk/wiki/PackageStructure

Thanks for any advice.

-Steven Bird (NLTK Project coordinator)