Best Practices for Internal Package Structure

Chris Angelico rosuav at gmail.com
Wed Apr 6 21:40:58 EDT 2016


On Thu, Apr 7, 2016 at 5:37 AM, Sven R. Kunze <srkunze at mail.de> wrote:
> On 06.04.2016 01:47, Chris Angelico wrote:
>>
>> Generally, I refactor code not because the files are getting "too
>> large" (for whatever definition of that term you like), but because
>> they're stretching the file's concept. Every file should have a
>> purpose; every piece of code in that file should ideally be supporting
>> exactly that purpose.
>
>
> Well said.
>
> The definition of purpose and concept are blurry, though. So, what is within
> the boundary of a concept is hard to define.

Oh, I never said it was easy to define :) That's the goal, to be sure,
but there are plenty of times when code goes into file X because it's
called by stuff in file X, even if conceptually it might be closer to
file Y. And then when file Z needs it, does it get moved to file Y, or
left where it is? Hard problem.

> @Steven
> You might not understand the purpose of the guideline. That's what makes
> them so valuable. It's hard to get them right and it's hard to understand
> them if you don't have any experience with them.

I'm not sure what you mean here. Are you saying that the guideline is
valuable precisely because people don't understand its purpose? I can
believe you, in the sense that the guideline becomes a summary of a
whole lot of rules (eg "you may not check in your code if it doesn't
comply with PEP 8" rather than arguing the specifics of why specific
style rules are important), but generally, there should be a
straight-forward explanation. And since this is a question of style,
there's usually going to be arguments on both sides :)

> An attempt of an explanation (which maybe in itself not 100% correct): there
> are two different forces acting on the source code:
>
> 1) make it short and concise (the 2-pages guideline)
> 2) keep conceptually close things together (cf. Chris)
>
> So, there's always a bargaining of what can be put in/removed from a module
> in the first place:
>
> "just these 14 lines, please; we need that feature"
> "but the module already has 310 lines"
> "only this one last one, please; it belongs here"
> "but it's an if-else and another def; nasty nesting and more complexity"
> "hmm, what if we remove those 5 over here? we don't need them anymore"
> "really? then, we can remove 2 superfluous newlines and 2 import lines as
> well"
> "I could even squeeze those 14 lines to 10 using dict comprehensions"
> "that's even more readable; +1 line, that's okay"
>
> Life is full of compromises.

This kind of bargaining sounds like a great way to force compromises
on people. Let's take it up a level: I need you to maintain this
project, but you're not allowed to make it exceed 10K lines of code
across all files. And it's already at 9,994 lines. So if you want to
add code to it, you have to find someplace else to remove code first.
Like all such guidelines, it has some plausible justification (it
prevents software bloat), but this rule can only hurt your project.
What happens when you have fifteen lines of code to add, to cope with
an obscure edge case involving user input? Which of these options do
you pick?

1) Edit your code down to just six lines, cramming everything in as
tightly as possible. To stay within an 80-char limit, you shorten
names and avoid comments. OR...
2) Find fifteen lines of internal API documentation somewhere and
delete it. Everyone should know that stuff by now, and if not, they
can always look back through source control. OR...
3) Find two twenty-line blocks of code that look identical and really
quickly break it out into a function. HA! WIN! And in your haste, you
don't notice that there was a subtle difference between them. OR...
4) Spend three hours working on a file, completely reworking it in a
more compact way, saving 50-100 lines of code, but losing track of
what you were doing, costing you an additional half an hour on your
original job as well.

All of them comply with the arbitrary 10KLOC limit. All of them have
hefty costs in future maintainability. And having a code file size
limit is exactly the same, except that you add a fifth option "find
something to move to another file", which just moves the problem
around a bit. Arbitrary limits might solve one problem, but they tend
to introduce another.

ChrisA



More information about the Python-list mailing list