Project organization and import

Mon Mar 5 12:15:13 EST 2007

On 5 Mar 2007 08:32:34 -0800, Martin Unsal <martinunsal at gmail.com> wrote:
> Jorge, thanks for your response. I replied earlier but I think my
> response got lost. I'm trying again.
>
> On Mar 4, 5:20 pm, Jorge Godoy <jgo... at gmail.com> wrote:
> > Why?  RCS systems can merge changes.  A RCS system is not a substitute for
> > design or programmers communication.
>
> Text merges are an error-prone process. They can't be eliminated but
> they are best avoided when possible.
>
> When refactoring, it's much better to move small files around than to
> move chunks of code between large files. In the former case your SCM
> system can track integration history, which is a big win.
>
> > Unit tests help being sure that one change doesn't break the project as a
> > whole and for a big project you're surely going to have a lot of those tests.
>
> But unit tests are never an excuse for error prone workflow. "Oh,
> don't worry, we'll catch that with unit tests" is never something you
> want to say or hear.
>

That's actually the exact benefit of unit testing, but I don't feel
that you've actually made a case that this workflow is error prone.
You often have multiple developers working on the same parts of the
same module?

> > I don't reload...  When my investigative tests gets bigger I write a script
> > and run it with the interpreter.  It is easy since my text editor can call
> > Python on a buffer (I use Emacs).
>
> That's interesting, is this workflow pretty universal in the Python
> world?
>
> I guess that seems unfortunate to me, one of the big wins for
> interpreted languages is to make the development cycle as short and
> interactive as possible. As I see it, the Python way should be to
> reload a file and reinvoke the class directly, not to restart the
> interpreter, load an entire package and then run a test script to set
> up your test conditions again.

If you don't do this, you aren't really testing your changes, you're
testing your reload() machinery. You seem to have a lot of views about
what the "Python way" should be and those are at odds with the actual
way people work with Python. I'm not (necessarily) saying you're
wrong, but you seem to be coming at this from a confrontational
standpoint.

Your claim, for example, that the language shouldn't place constraints
on how you manage your modules is questionable. I think it's more
likely that you've developed a workflow based around the constraints
(and abilities) of other languages and you're now expecting Python to
conform to that instead of its own.

I've copied some of your responses from your earlier post below:

>Yes. I've worked extensively on several projects in several languages
>with multi-million lines of code and they invariably have coding
>styles that recommend one functional unit (such as a class), or at
>most a few closely related functional units per file.

I wonder if you've ever asked yourself why this is the case. I know
from my own experience why it's done in traditional C++/C environments
- it's because compiling is slow and breaking things into as many
files (with as few interdependencies) as possible speeds up the
compilation process. Absent this need (which doesn't exist in Python),
what benefit is there to separating out related functionality into
multiple files? Don't split them up just because you've done so in the
past - know why you did it in the past and if those conditions still
apply. Don't split them up until it makes sense for *this* project,
not the one you did last year or 10 years ago.

>I guess my question boils down to this. Is "from foo import *" really
>deprecated or not? If everyone has to use "from foo import *" despite
>the problems it causes, how do they work around those problems (such
>as reloading)?

from foo import * is a bad idea at a top level because it pollutes
your local namespace. In a package __init__, which exists expressly
for the purpose of exposing it's interior namespaces as a single flat
one, it makes perfect sense. In some cases you don't want to export
everything, which is when __all__ starts to make sense. Clients of a
package (or a module) shouldn't use from foo import * without a good
reason. Nobody I know uses reload() for anything more than trivial "as
you work" testing in the interpreter. It's not reliable or recommended
for anything other than that. It's not hard to restart a shell,
especially if you use ipython (which can save and re-create a session)
or a script thats set up to create your testing environment. This is
still a much faster way than compiling any but the most trivial of
C/C++ modules. In fact, on my system startup time for the interpreter
is roughly the same as the "startup time" of my compiler (that is to
say, the amount of time it takes deciding what its going to compile,
without actually compiling anything).

>You're still stuck doing foo.Foo() everywhere in your client code,
>which is ugly and wastes space, or using "from foo import *" which is
>broken.

If you don't like working with explicit namespaces, you've probably
chosen the wrong language. If you have a specific name (or a few
names) which you use all the time from a module, then you can import
just those names into your local namespace to save on typing. You can
also alias deeply nested names to something more shallow.

>For myriad reasons, just one of them being the one I stated -- smaller
>files with one functional unit each are more amenable to source code
>management with multiple developers.

I propose that the technique most amenable to source code management
is for a single file (or RCS level module, if you have a locking RCS)
to have everything that it makes sense to edit or change for a
specific feature. This is an impossible goal in practice (because you
will inevitably and necessarily have intermodule dependencies) but
your developers don't write code based around individual files. They
base it around the systems and the interfaces that compose your
project. It makes no more sense to arbitrarily break them into
multiple files than it does to arbitrarily leave them all in a single
file.

In summary: I think you've bound yourself to a style of source
management that made sense in the past without reanalyzing it to see
if it makes sense now. Trust your judgment and that of your developers
when it comes to modularization. When they end up needing to merge all
the time because they're conflicting with someone else's work, they'll
break things up into modules.

You're also placing far too much emphasis on reload. Focus yourself on
unit tests and environment scripts instead. These are more reliable
and easier to validate than reload() in a shell.