suggestions for functional style (singleton pattern?)

Sat Feb 28 22:45:54 EST 2015

On Sat, 28 Feb 2015 16:12:54 -0700, yves at zioup.com wrote:

>Hi,
>
>For some scripts, I write in a a more functional way, using a lot of small
>functions outside of any class. Although it makes the code clearer for
>specific cases, I have found that it makes debugging and using the repl in
>general difficult, as as I have to re-initialise every single objects every time.
>
>I have now started to use some kind of state pattern to alleviate this, here's
>a simplistic example:
>
>https://github.com/dorfsmay/state_pattern_for_debugging_python/blob/master/dirstats.py
>
>Are there better ways to address this? Any suggestion on this style?

(warning: Didn't bother prove-reading and spell chocking. Sorry...)

Problem 1:
With the exception of get_all_files(), all your functions do is to
call another standard function. This decreases performance
unnecessarily and may mislead the users of your API into thinking
these functions perform some different type of average or min/max
operations.

For a single call to your API, the performance issue isn't a problem.
But if some user wraps a DirStat instance in a loop, they may suffer
an unnecessary hit.

Meanwhile, If I see a calculate_average() or a find_smallest()
function in your API, I immediately deduce it must be doing something
different than a standard average or min/max operation. Otherwise why
would the API code bother about something I can easily do myself? And
when I go look in your code trying to understand why you are offering
me an average function, I'll become dissapointed.

Solution:
Do not create an interface for standard functions, unless you plan to
change how these functions work. Your desire to let your users know
that they can calculate an average from your API object will fall in
deaf ears because they can realize that from themselves. An API should
only expose unique methods or complex operations wrapped in one or two
methods.

If you are coding a method with a single line and that line starts
with return and is followed by a function call, you know you are
creating a redundant interface.

Problem 2:
No one likes long names. And if some user of your API is coding under
an 80 character per line limit, they'll hate you for it. Your
functions contain redundant terms that only increase the function name
size and do not offer any advantage in having a better grasp of what
the function does.

average() is better than calculate_average(), for instance.

Solution:
Always be careful with how you name your public interface elements.
These will define much of the 'personality' of your API.
calculate_average() makes for a tacky API. average() makes for a
pretty standard API.

Problem 3:
Modules are self-contained contextualized units that must contain only
code that fits in the namespace the module defines for itself. If this
module is called dirstats and is chiefed by a DirStats object that
manages a dictionary of files, a public calculate_average() function
or a min/max pair of functions simply don't fit within the module
context.

At the very least they should be moved inside the class so it becomes
clear to anyone caring that these functions really only matter in the
context of the DirStats object.

(But because of problem 1, these functions should not even exist and
instead be replaced by direct calls to the statistics module functions
inside the DirStats class.)

The get_all_files() function is an offender. It should be moved inside
the class and probably even be removed entirely. Since it apparently
has no other objective than to help the class __init__() method, the
function code just just be copied/pasted inside that method.

Solution:
Do not make a module a house of confusion. Each module should contain
only public functions and objects that belong with each other under
the module proposed context.

Also look carefully at what you are putting outside of a class. You
are making it a part of the public interface for that module. The only
reason to do so is if it can have some use outside the class
definition.

Problem 4:
You speak of a singleton. But you haven't implemented one. It is not
clear from your code if this class should be a singleton. I'm guessing
not. Singletons are in fact rare. Well, let me put it another way:
Good reasons to code a singleton are rare.

A good singleton, for instance could be the Map() object in a strategy
game, or the canonical Logging() class in most programs that implement
logging. These are really one-time objects that make sense to ever
always exist as just one instance.

Solution:
But, if you ever wish to implement a singleton, the following pattern
mostly works and is short and to the point:

    class Map():
        _instance = None

        def __new__(cls, *args, **kwargs):
            if Map._instance is None:
                Map._instance = super(Map, cls).__new__(cls)
            return Map._instance

    >>> a = Map()
    >>> b = Map()
    >>> a is b
    True

Just remember, for most singletons the truth is that what you think
you will only need 1 today, you will need 2 tomorrow.

Problem 5:
A pet peeve of mine. If you aren't going to use a variable, make that
explicit in your code by taking advatange of Python wonderful language
features.

In your get_all_files() function, 'dirs' is not used. So make that
explicit to you and anyone else reading your code, by using either the
underscore or double underscore variable naming convention.

Solution:
It goes like this:

    for root, __, files in os.walk(directory):
        for fn in files:
            abs_path = os.path.join(root, fn)
            all_files[abs_path] = os.path.getsize(abs_path)

....

So to finalize. How should your dirstats.py module probably look like?
Something like this:

import os
import inspect
import argparse
import operator
import statistics

class DirStat(object):
    def __init__(self, directory):
        self.all_files = dict()
        for root, dirs, files in os.walk(directory):
            for fn in files:
                abs_path = os.path.join(root, fn)
                self.all_files[abs_path] = os.path.getsize(abs_path)

        self.average = statistics.mean(self.all_files.values())
        self.median = statistics.median(self.all_files.values())
        self.largest = max(self.all_files, self.all_files.get)
        self.smallest = min(self.all_files, self.all_files.get)

    def print(self):
        print("mean: {:.2f}".format(self.average))
        print("median: {:.2f}".format(self.median))
        print("smallest file: {}".format(self.smallest))
        print("largest file: {}".format(self.largest))

    def __repr__(self):
        return self.__dict__.__repr__()

I also wanted to say something about your __repr__, but this post is
already too long.