Best Practices for Internal Package Structure

Steven D'Aprano steve at pearwood.info
Mon Apr 4 21:43:59 EDT 2016


On Tue, 5 Apr 2016 02:47 am, Josh B. wrote:

> My package, available at https://github.com/jab/bidict, is currently laid
> out like this:
> 
> bidict/
> ├── __init__.py
> ├── _bidict.py
> ├── _common.py
> ├── _frozen.py
> ├── _loose.py
> ├── _named.py
> ├── _ordered.py
> ├── compat.py
> ├── util.py


The purpose of packages isn't enable Java-style "one class per file" coding,
especially since *everything* in the package except the top level "bidict"
module itself is private. bidict.compat and bidict.util aren't flagged as
private, but they should be, since there's nothing in either of them that
the user of a bidict class should care about.

(utils.py does export a couple of functions, but they should be in the main
module, or possibly made into a method of BidirectionalMapping.)

Your package is currently under 500 lines. As it stands now, you could
easily flatten it to a single module:

bidict.py

Unless you are getting some concrete benefit from a package structure, you
shouldn't use a package just for the sake of it. Even if the code doubles
in size, to 1000 lines, that's still *far* below the point at which I
believe a single module becomes unwieldy just from size. At nearly 6500
lines, the decimal.py module is, in my opinion, *almost* at the point where
just size alone suggests splitting the file into submodules. Your module is
nowhere near that point.

It seems to me that you're paying the cost of the increased complexity
needed to handle a package, but not (as far as I can tell) gaining any
benefit from it. Certainly the *users* of your package aren't: all the
public classes are pre-imported by the __init__.py file, so they don't even
get the advantage of only needing to import classes that they actually use. 

So my recommendation would be to collapse the package to a single file.

If you choose to reject that recommendation, or perhaps you are getting some
benefit that I haven't spotted in a cursory look over the package, then my
suggestion is to document that the *only* public interface is the
top-level "import bidict", and that *all* submodules are private. Then drop
the underscores, possibly re-arrange a bit:


bidict/
├── __init__.py  # the only public part of the package
├── bidict.py
├── common.py  # includes compat and util
├── frozen.py
├── loose.py
├── named.py
├── ordered.py


If you're worried about the lack of underscores for private submodules, then
consider this alternate structure:


_bidict/
├── __init__.py
├── bidict.py
├── common.py
├── frozen.py
├── loose.py
├── named.py
├── ordered.py
bidict.py


where "bidict.py" is effectively little more than:

from _bidict import *



> What do you think of the code layout, specifically the use of the _foo
> modules? It seems well-factored to me, but I haven't seen things laid out
> this way very often in other projects, and I'd like to do this as nicely
> as possible.

It's very pretty, and well-organised, but I'm not sure you're actually
gaining any advantage from it.


> It does kind of bug me that you see the _foo modules in the output when
> you do things like this:
> 
>>>> import bidict
>>>> bidict.bidict
> <class 'bidict._bidict.bidict'>
>>>> bidict.KeyExistsError
> <class 'bidict._common.KeyExistsError'>
> 
> 
> In https://github.com/jab/bidict/pull/33#issuecomment-205381351 a reviewer
> agrees:
> 
> """
> Me too, and it confuses people as to where you should be importing things
> from if you want to catch it, inviting code like
> 
> ```
> import bidict._common


That, at least, should be an obvious no-no, as it's including a single
underscore private name. I wouldn't worry about that too much: if your
users are so naive or obnoxious that they're ignoring your documentation
and importing _private modules, there's nothing you can do: they'll find
*some* way to shoot themselves in the foot, whatever you do.

[Aside: my devs at work had reason to look inside a script written by one of
their now long-moved away former colleagues yesterday, as it recently
broke. After reading the script, the *first* thing they did was change the
way it was called from:

scriptname --opts steve
# yes, he really did use my name as the mandatory argument to the script

to

scriptname --opts forfucksakeandrew

where the name "andrew" has been changed to protect the guilty. Using my
name as the script argument is, apparently, the *least* stupid thing the
script does.]


> try:
>     ...
> except bidict._common.KeyExistsError:
>     ...
> ```
> ie. becoming dependent on the package internal structure.
> 
> I would be tempted to monkey-patch .__module__ = __name__ on each imported
> class to get around this. Maybe there are downsides to doing magic of that
> kind, but dependencies on the internals of packages are such a problem for
> me in our very large codebase, that I'd probably do it anyway in order to
> really explicit about what the public API is. """

Does his team not do internal code reviews? I hate code that lies about
where it comes from. I certainly wouldn't do it in a futile attempt to
protect idiots from themselves.



> Curious what folks on this list recommend, or if there are best practices
> about this published somewhere.
> 
> Thanks,
> Josh

-- 
Steven




More information about the Python-list mailing list