Project organization and import

Mon Mar 5 14:06:08 EST 2007

On 5 Mar 2007 10:31:33 -0800, Martin Unsal <martinunsal at gmail.com> wrote:
> On Mar 5, 9:15 am, "Chris Mellon" <arka... at gmail.com> wrote:
> > That's actually the exact benefit of unit testing, but I don't feel
> > that you've actually made a case that this workflow is error prone.
> > You often have multiple developers working on the same parts of the
> > same module?
>
> Protecting your head is the exact benefit of bike helmets, that
> doesn't mean you should bike more more recklessly just because you're
> wearing a helmet. :)
>
> Doing text merges is more error prone than not doing them. :)
>
> There are myriad other benefits of breaking up large files into
> functional units. Integration history, refactoring, reuse, as I
> mentioned. Better clarity of design. Easier communication and
> coordination within a team. What's the down side? What's the advantage
> of big files with many functional units?
>

I never advocated big files with many functional units - just files
that are "just big enough". You'll know you've broken them down small
enough when you stop having to do text merges every time you commit.

> > If you don't do this, you aren't really testing your changes, you're
> > testing your reload() machinery.
>
> Only because reload() is hard in Python! ;)
>
> > You seem to have a lot of views about
> > what the "Python way" should be and those are at odds with the actual
> > way people work with Python. I'm not (necessarily) saying you're
> > wrong, but you seem to be coming at this from a confrontational
> > standpoint.
>
> When I refer to "Pythonic" all I'm talking about is what I've read
> here and observed in other people's code. I'm here looking for more
> information about how other people work, to see if there are good
> solutions to the problems I see.
>
> However when I talk about what I think is "wrong" with the Pythonic
> way, obviously that's just my opinion formed by my own experience.
>
> > Your claim, for example, that the language shouldn't place constraints
> > on how you manage your modules is questionable. I think it's more
> > likely that you've developed a workflow based around the constraints
> > (and abilities) of other languages and you're now expecting Python to
> > conform to that instead of its own.
>
> I don't think so; I'm observing things that are common to several
> projects in several languages.
>

.... languages with similar runtime semantics and perhaps common
ancestry? All languages  place limitations on how you handle modules,
either because they have infrastructure you need to use or because
they lack it and you're left on your own.

> > I wonder if you've ever asked yourself why this is the case. I know
> > from my own experience why it's done in traditional C++/C environments
> > - it's because compiling is slow and breaking things into as many
> > files (with as few interdependencies) as possible speeds up the
> > compilation process.
>
> I don't think that's actually true. Fewer, bigger compilation units
> actually compile faster in C, at least in my experience.
>

If you're doing whole project compilation. When you're working,
though, you want to be able to do incremental compilation (all modern
compilers I know of support this) so you just recompile the files
you've changed (and dependencies) and relink. Support for this is why
we have stuff like precompiled headers, shadow headers like Qt uses,
and why C++ project management advocates single class-per-file
structures. Fewer dependencies between compilation units means a
faster rebuild-test turnaround.

> > Absent this need (which doesn't exist in Python),
>
> Python still takes time to load & "precompile". That time is becoming
> significant for me even in a modest sized project; I imagine it would
> be pretty awful in a multimillion line project.
>
> No matter how fast it is, I'd rather reload one module than exit my
> interpreter and reload the entire world.
>

Sure, but whats your goal here? If you're just testing something as
you work, then this works fine. If you're testing large changes, that
affect many modules, then you *need* to reload your world, because you
want to make sure that what you're testing is clean. I think this
might be related to your desire to have everything in lots of little
files. The more modules you load, the harder it is to track your
dependencies and make sure that the reload is correct.

> This is not a problem for Python as scripting language. This is a real
> problem for Python as world class application development language.
>

Considering that no other "world class application development
language" supports reload even as well as Python does, I'm not sure I
can agree here. A perfect reload might be a nice thing to have, but
lack of it hardly tosses Python (or any language) out of the running.

> > In a package __init__, which exists expressly
> > for the purpose of exposing it's interior namespaces as a single flat
> > one, it makes perfect sense.
>
> OK! That's good info, thanks.
>
> > Nobody I know uses reload() for anything more than trivial "as
> > you work" testing in the interpreter. It's not reliable or recommended
> > for anything other than that.
>
> That too... although I think that's unfortunate. If reload() were
> reliable, would you use it? Do you think it's inherently unreliable,
> that is, it couldn't be fixed without fundamentally breaking the
> Python language core?
>

The semantics of exactly what reload should do are tricky. Pythons
reload works in a sensible but limited way. More complicated reloads
are generally considered more trouble than they are worth. I've wanted
different things from reload() at different times, so I'm not even
sure what I would consider it being "reliable".

Here's a trivial example - if you rename a class in a module and then
reload it, what should happen to instances of the class you renamed?

> > This is
> > still a much faster way than compiling any but the most trivial of
> > C/C++ modules.
>
> I'm with you there! I love Python and I'd never go back to C/C++. That
> doesn't change my opinion that Python's import mechanism is an
> impediment to developing large projects in the language.
>
> > If you don't like working with explicit namespaces, you've probably
> > chosen the wrong language.
>
> I never said that. I like foo.Bar(), I just don't like typing
> foo.Foo() and bar.Bar(), which is a waste of space; syntax without
> semantics.
>

There's nothing that prevents there being a bar.Foo, the namespace
makes it clear where you're getting the object. This is again a
consequence of treating modules like classes. Some modules only expose
a single class (StringIO/cStringIO in the standardlib is a good
example), but it's more common for them to expose a single set of
"functionality".

That said, nothing prevents you from using "from foo import Foo" if
Foo is all you need (or need most - you can combine this with import
foo).

> > I propose that the technique most amenable to source code management
> > is for a single file (or RCS level module, if you have a locking RCS)
> > to have everything that it makes sense to edit or change for a
> > specific feature.
>
> Oh, I agree completely. I think we're using the exact same criterion.
> A class is a self-contained feature with a well defined interface,
> just what you'd want to put in it's own file. (Obviously there are
> trivial classes which don't implement features, and they don't need
> their own files.)
>

Sure, if all your classes are that. But very few classes exist in
isolation - there's external and internal dependencies, and some
classes are tightly bound. There's no reason for these tightly bound
classes to be in external files (or an external namespace), because
when you work on one you'll need to work on them all.

> > You're also placing far too much emphasis on reload. Focus yourself on
> > unit tests and environment scripts instead. These are more reliable
> > and easier to validate than reload() in a shell.
>
> I think this is the crux of my frustration. I think reload() is
> unreliable and hard to validate because Python's package management is
> broken. I appreciate your suggestion of alternatives and I think I
> need to come to terms with the fact that reload() is just broken. That
> doesn't mean it has to be that way or that Python is blameless in this
> problem.
>

I wonder what environments you worked in before that actually had a
reliable and gotcha free version of reload? I actually don't know of
any - Smalltalk is closest. It's not really "broken" when you
understand what it does. There's just an expectation that it does
something else, and when it doesn't meet that expectation it's assumed
to be broken. Now, thats a fair definition of "broken", but replacing
running instances in a live image is a very hard problem to solve
generally. Limiting reload() to straightforward, reliable behavior is
a reasonable design decision.