Organizing modules and their code

Fri Feb 3 17:14:51 EST 2023

On 2023-02-03 at 13:18:46 -0800,
transreductionist <transreductionist at gmail.com> wrote:

> Here is the situation. There is a top-level module (see designs below)
> containing code, that as the name suggests, manages an ETL pipeline. A
> directory is created called etl_helpers that organizes several modules
> responsible for making up the pipeline. The discussion concerns the
> Python language, which supports OOP as well as Structural/Functional
> approaches to programming.

> I am interested in opinions on which design adheres best to standard
> architectural practices and the SOLID principles. I understand that
> this is one of those topics where people may have strong opinions one
> way or the other. I am interested in those opinions.

Okay, I'll start:  unless one of extract, transform, or load is already,
or will certainly at some point become, complex/complicated enough to be
its own architectural module with its own architectural substructure; or
you're constructing specific ETL pipelines for specific ETL jobs at the
times the jobs are defined; then I think you're overthinking it.

Note that I say that speaking as a notorious overthinker.  ;-)

Keep It Simple:  Put all four modules at the top level, and run with it
until you falsify it.  Yes, I would give you that same advice no matter
what language you're using.

FWIW, I'm not a big fan of OO, but based on what little I know about
your ETL pipelines, I agree with you that it probably doesn't make a big
difference at this level.  Define solid (in pretty much any/every sense
of the word, capitalized or not) interfaces between your modules, and
write your code against those interfaces, whether OO or any other
paradigm.