Portrait of a "real life" metaclass

Sun Nov 11 02:48:49 EST 2007

On Nov 10, 3:34 am, Mark Shroyer <usenet-m... at markshroyer.com> wrote:
> On 2007-11-10, Jonathan Gardner <jgardner.jonathangardner.... at gmail.com> wrote:
> > What would I have done? I wouldn't have had an age matching class. I
> > would have had a function that, given the datetime and a range
> > specification, would return true or false. Then I would've written
> > another function for matching emails. Again, it takes a specification
> > and the email and returns true or false.
>
> There isn't much difference between
>
>   match_calendar_month(2007, 11, message)
>
> and
>
>   m = CalendarMonthMatcher(2007, 11)
>   m.match(message)

Yes, there isn't a world of difference between the two. But there is a
world of difference between those and:

   match(message, before=date(2007, 12, 1), after=date(2007, 11, 1))

And you can add parameters as needed. In the end, you may have a lot
of parameters, but only one match function and only one interface.

> <snip> But take for example two of my app's mailbox actions -- these aren't
> their real names, but for clarity let's call them ArchiveByMonth and
> SaveAttachmentsByMonth.  The former moves messages from previous
> months into an archival mbox file ./archives/YYYY/MM.mbox
> corresponding to each message's month, and the latter saves message
> attachments into a directory ./attachments/YYYY/MM/.  Each of these
> actions would work by using either match_calendar_month() or
> CalendarMonthMatcher().match() to perform its action on all messages
> within a given month; then it iterates through previous months and
> repeats until there are no more messages left to be processed.
>
> In my object-oriented implementation, this iteration is performed by
> calling m.previous() on the current matcher, much like the
> simplified example in my write-up.  Without taking the OO approach,
> on the other hand, both types of actions would need to compute the
> previous month themselves; sure that's not an entirely burdensome
> task, but it really seems like the wrong place for that code to
> reside.  (And if you tackle this by writing another method to return
> the requisite (year, month) tuple, and apply that method alongside
> wherever match_calendar_month() is used...  well, at that point
> you're really just doing object-oriented code without the "class"
> keyword.)
>
> Furthermore, suppose I want to save attachments by week instead of
> month: I could then hand the SaveAttachmentsByPeriod action a
> WeekMatcher instead of a MonthMatcher, and the action, using the
> matcher's common interface, does the job just as expected.  (This is
> an actual configuration file option in the application; the nice
> thing about taking an OO approach to this app is that there's a very
> straightforward mapping between the configuration file syntax and
> the actual implementation.)
>
> It could be that I'm still "thinking in Java," as you rather
> accurately put it, but here the object-oriented approach seems
> genuinely superior -- cleaner and, well, with better encapsulated
> functionality, to use the buzzword.
>

Or it could be that you are confusing two things with each other.

Let me try to explain it another way. Think of all the points on a
grid that is 100x100. There are 10,000 points, right? If you wanted to
describe the position of a point, you could name each point. You'd
have 10,000 names. This isn't very good because people would have to
know all 10,000 names to describe a point in your system. But it is
simple, and it is really easy to implement. But hey, we can just
number the points 0 to 9999 and it gets even simpler, right?

OR you could describe the points as an (x,y) pair. Now people only
have to remember 200 different names--100 for the columns, 100 for the
rows. Then if you used traditional numbers, they'd only have to be
able to count to 100.

Computer science is full of things like this. When you end up with
complexity, it is probably because you are doing something wrong. My
rule of thumb is if I can't explain it all in about 30 seconds, then
it is going to be a mystery to everyone but myself no matter how much
documentation I write.

How do you avoid complexity? You take a step back, identify patterns,
or pull different things apart from each other (like rows and
columns), and try to find the most basic principles to guide the
entire system.

The very fact that you are talking about months (and thus days and
weeks and years and centuries, etc...) and not generic dates means you
have some more simplifying to do in your design elsewhere as well.

Rewrite the SaveAttachmentsByMonth so that it calls a more generic
SaveAttachmentsByDateRange function. Or better yet, have it
FilterEmailsByDateRange and ExtractAttachment and SaveAttachment. Or
even better, have it FilterEmailsBySpecification(date_from=X,
date_to=Y) and SaveAttachmentl.

Do you see the point? Your big function SaveAttachmentsByMonth is kind
of like point number 735. It's easier to describe it as the point at
(7,35) than as a single number. It's better to talk about the most
basic functionality --- saving emails and filter emails -- rather than
talking about big concepts.

I call this concept "orthogonality" after the same concept in linear
algebra. It's just easier when you are dealing in an orthogonal basis--
or a set of functions that do simple things and don't replicate each
other's functionality.

Your users will appreciate it as well. While it may be nice to have a
shiny button that saves attachments by months, they'd rather they
could specify the date ranges theyd like to use (hours? Days? Weeks?
Quarters?) and what they'd like to save (the attachments, the entire
email, etc...) (Better yet, what they'd like to *do*.)

> > If I really wanted to pass around the specifications as objects, I
> > would do what the re module does: have one generic object for all the
> > different kinds of age matching possible, and one generic object for
> > all the email objects possible. These would be called,
> > "AgeMatchSpecification", etc... These are noun-y things. Here,
> > however, they are really a way of keeping your data organized so you
> > can tell that that particular dict over there is an
> > AgeMatchSpecification and that one is an EmailMatchSpecification. And
> > remember, the specifications don't do the matching--they merely tell
> > the match function what it is you wanted matched.
>
> Oddly enough, the re module was sort of my inspiration here:
>
>   my_regex = re.compile("abc")
>   my_regex.match("some string")
>
> (Sure, re.compile() is a factory function that produces SRE_Pattern
> instances rather than the name of an actual class, but it's still
> used in much the same way.)
>

Except we don't have different kinds of re expressions for different
kinds of matching. One spec to handle everything is good enough, and
it's much simpler. If you have the time, try to see what people did
before regex took over the world. In fact, try writing a text parser
that doesn't use one general regex function. You'll quickly discover
why one method with one very general interface is the best way to
handle things.

> > Now, part of the email match specification would probably include bits
> > of the date match specification, because you'd want to match the
> > various dates attached to an email. That's really not rocket science
> > though.
>
> > There wouldn't be any need to integrate the classes anymore if I did
> > it that way. Plus, I wouldn't have to remember a bunch of class names.
> > I'd just have to remember the various parameters to the match
> > specification for age matching and a different set of parameters for
> > the email matching.
>
> You're sort of missing the bigger picture of this application,
> although that's entirely not your fault as I never fully described
> it to begin with.  The essence of this project is that I have a
> family of mailbox actions (delete, copy, archive to mailbox, archive
> by time period, ...) and a family of email matching rules (match
> read messages, match messages with attachments, match messages of a
> certain size, match messages by date, ...) of which matching by date
> is only one subtype -- but there are even many different ways to
> match by date (match by number of days old, match by specific
> calendar month, match by specific calendar month *or older*, match
> by day of the week, ...); not to mention arbitrary Boolean
> combinations of other matching rules (and, or, not).
>
> My goal is to create a highly configurable and extensible app, in
> which the user can mix and match different "action" and "matcher"
> instances to the highest degree possible.  And using class
> definitions really facilitates that, to my Java-poisoned mind.  For
> example, if the user writes in the config file
>
>   actions = (
>     (
>       # Save attachments from read messages at least 10 days old
>       mailbox => (
>         path => '/path/to/maildir',
>         type => 'maildir',
>       ),
>       match => (
>         type => And,
>         p => (
>           type => MarkedRead,
>           state => True,
>         ),
>         q => (
>           type => DaysOld,
>           days => 10,
>         ),
>       ),
>       action => (
>         type => SaveAttachments,
>         destination => '/some/directory/',
>       ),
>     ),
>   )
>
> (can you tell I've been working with Lighttpd lately?)
>
> then my app can easily read in this dictionary and map the
> user-specified actions directly into Matcher and Action instances;
> and this without me having to write a bunch of code to process
> boolean logic, matching types, action parameters, and so on into a
> program flow that has a structure needlessly divergent from the
> configuration file syntax.  It also means that, should a user
> augment the program with his own Matcher or Action implementation,
> as I intend to make it easy to do, then those implementations can be
> used straightaway without even touching the code for the
> configuration file reader.
>

See, you are thinking in general terms, but you are writing a specific
implementation. In other words, you're talking about the problem the
right way, but you're trying to write the code in a different way.
Coming from a C, perl, or Java background, this is to be expected.
Those languages are a strait-jacket that impose themselves on your
very thoughts. But in Python, the code should read like pseudo-code.
Python *is* pseudo-code that compiles, after all.

You don't need many classes because the branching logic--the bits or
the program that say, "I'm filtering by days and not months"--can be
contained in one bigger function that calls a very general sub-
function. There's no need to abstract that bit out into a class. No
one is going to use it but the match routine. Just write the code that
does it and be done with it.

In fact, by writing classes for all these branches in the program
logic, you are doing yourself a disservice. When you return to this
code 3 weeks from now, you'll find all the class declarations and
metaclass and syntactic sugar is getting in your way of seeing what is
really happening. That is always bad and should be avoided, just like
flowery language and useless decorum should be avoided.

> As for the decision to use a metaclass proxy to my AgeSpec classes,
> I'm fully prepared to admit wrongdoing there.  But I still believe
> that an object-oriented design is the best approach to this problem,
> at least considering my design goals.  Or am I *still* missing the
> point?
>

No, you're arguing about the thing I wanted to argue about, so you see
the point I am trying to make.

It's painful to realize that all those years learning OO design and
design patterns just to make Java usable are wasted in the world of
Python. I understand that because I've invested years mastering C++
and perl before discovering Python. Your solace comes when you embrace
that and see how much simpler life really is when the language gets
out of your way.

Portrait of a "real life" __metaclass__

Portrait of a "real life" metaclass