Portrait of a "real life" __metaclass__

Mark Shroyer usenet-mail at markshroyer.com
Sun Nov 11 08:21:22 EST 2007


On 2007-11-11, Jonathan Gardner <jgardner.jonathangardner.net at gmail.com> wrote:
>> There isn't much difference between
>>
>>   match_calendar_month(2007, 11, message)
>>
>> and
>>
>>   m = CalendarMonthMatcher(2007, 11)
>>   m.match(message)
>
> Yes, there isn't a world of difference between the two. But there
> is a world of difference between those and:
>
>    match(message, before=date(2007, 12, 1), after=date(2007, 11, 1))
>
> And you can add parameters as needed. In the end, you may have a
> lot of parameters, but only one match function and only one
> interface.

No, that would be an absolutely, positively bad decision.  Heck,
suppose I wanted to match messages from "@ufl.edu" that are at least
seven days old, OR all other messages from the previous month or
earlier -- but only if we're at least four days into the current
month.  (This isn't a fanciful invention for the sake of argument,
it's an actual rule I'm using right now.)  Then, at best, we'd have
something like:

    (match(message, domain="ufl.edu")
      and
    match(message, before=date(2007, 11, 4))
  or (match(message, before=date(2007, 11, 1)), \
        butMatchThePreviousMonthInsteadIfDaysIntoCurrentMonthIsLessThan=4))

Or you could go the other way and try to solve it by adding even
more arguments to the function, but then you're even worse off for
complexity.  Either way, you're left without an abstract way to say
"this month" or "previous week" without implementing the logic to do
so separately from the data.  In my experience, that kind of design
ends up making things a lot more complicated than they need to be,
once the application is put to use and once you (or worse, somebody
else) starts wanting to extend it.

Compare that to the actual implementation of this rule in my
program:

  rule= Or( 
          And( 
            SenderPattern(re.compile("@ufl\.edu")),
            DaysOld(7)
          ),
          CalendarMonthOld(match_delay_days=4)
        )
  rule.match(message)

(The current implementation of CalendarMonthOld takes the current
month if not otherwise specified.)

> Or it could be that you are confusing two things with each other.
>
> [...]
>
> How do you avoid complexity? You take a step back, identify patterns,
> or pull different things apart from each other (like rows and
> columns), and try to find the most basic principles to guide the
> entire system.

If I could just break everything down into date ranges, I would.
But that doesn't give me the kind of behavior I want.

> The very fact that you are talking about months (and thus days and
> weeks and years and centuries, etc...) and not generic dates means you
> have some more simplifying to do in your design elsewhere as well.

No, it truly does not.  Sometimes I want to match a message that is
N days old; sometimes I want to match a message from the previous
*calendar month* or earlier, which can not be readily specified as a
number of days; the same goes for calendar year, calendar week, etc.
Some small amount of calculation has to be performed to convert a
given number of calendar weeks into a datetime range.  And in order
for the user -- that is, primarily, me, but hopefully others too
once I get around to polishing up this thing -- to be able to
generically say, "match all the messages from the last quarter",
bundling such behavior with the data it operates on makes the system
easier to implement and *much* easier to extend.

If I used a monolithic match() function, as you suggest, then any
user who wanted to implement a new message handling action or a new
matching rule would need to alter the core application.  With my
approach, all that user needs to do is toss a module containing his
or her custom Matcher and Action implementations into a certain
directory, and he's good to go.

> Rewrite the SaveAttachmentsByMonth so that it calls a more generic
> SaveAttachmentsByDateRange function. Or better yet, have it
> FilterEmailsByDateRange and ExtractAttachment and SaveAttachment. Or
> even better, have it FilterEmailsBySpecification(date_from=X,
> date_to=Y) and SaveAttachmentl.

Yeah, actually the class I have defined for this action *is* a more
generic "save attachments by date range" action type, as I used for
an example later in that post when I described passing it a
CalendarWeek date range instead of a CalendarMonth.  The confusion
on this point is my fault, though, as I also referred to this action
as SaveAttachmentsByMonth in a misguided attempt at clarifying my
point.

> Do you see the point? Your big function SaveAttachmentsByMonth is kind
> of like point number 735. It's easier to describe it as the point at
> (7,35) than as a single number. It's better to talk about the most
> basic functionality --- saving emails and filter emails -- rather than
> talking about big concepts.
>
> [...]
>
> Your users will appreciate it as well. While it may be nice to have a
> shiny button that saves attachments by months, they'd rather they
> could specify the date ranges theyd like to use (hours? Days? Weeks?
> Quarters?) and what they'd like to save (the attachments, the entire
> email, etc...) (Better yet, what they'd like to *do*.)

That's the point precisely!  How does one specify "last quarter," in
a configuration file, in terms of a raw range of dates, such that it
retains the meaning of "last quarter" as we progress from one month
to the next?  He doesn't.  He needs the application to understand
the concept of a "quarter" first.  With my approach, all he'd need
to do to get the app thinking in terms of quarters for him, is to
add to the app's extensions/ subdirectory a module containing the
following:

  class QuarterAgeSpec(AgeSpec):
    def __init__(self, relative_quarter=0):
      now = datetime.utcnow()
      year, quarter = now.year, (now.month-1)/3
      (self.year, self.quarter) \
          = self._relative(year, quarter, relative_quarter)

    def _relative(self, year, quarter, delta):
      quarter += delta
      if quarter/4 != 0:
        year += quarter/4
        quarter %= 4
      return (year, quarter)

    def match(self, timestamp):
      (n_year, n_quarter) = self._relative(self.year, self.quarter, 1)
      return timestamp >= datetime(self.year, self.quarter*3+1, 1) \
          and timestamp < datetime(n_year, n_quarter*3+1, 1)

Then my program calls Matcher.__subclasses__() and finds the new
implementation, so he can immediately use this new Matcher from his
configuration file as:

  actions = (
    (
      mailbox => (
        ...     
      ),
      match => (
        type => Quarter,
        relative_quarter => -1,
      ),
      action => (
        ...
      ),
    ),
  )

That's all there is to it.  He doesn't have to go about mucking with
the program's internals, he just needs to extend one specific class
with a well-defined interface.  How would you propose to accomplish
this following your monolithic match-function approach, on the other
hand?

> Except we don't have different kinds of re expressions for different
> kinds of matching. One spec to handle everything is good enough, and
> it's much simpler. If you have the time, try to see what people did
> before regex took over the world. In fact, try writing a text parser
> that doesn't use one general regex function. You'll quickly discover
> why one method with one very general interface is the best way to
> handle things.

Sure, if you're dealing with a known, fixed set of types of inputs.
But semantically -- in terms of how they're entered into the
configuration file, in terms of the logic needed to match them -- a
CalendarMonth and a DaysOld are *not* the same thing.  Slightly
different code is required to initialize / define either one; and in
my experience, when you have many differing behaviors and inputs
floating around together as such, it's often more productive to
group them together behind a class hierarchy.

> See, you are thinking in general terms, but you are writing a specific
> implementation. In other words, you're talking about the problem the
> right way, but you're trying to write the code in a different way.
> Coming from a C, perl, or Java background, this is to be expected.
> Those languages are a strait-jacket that impose themselves on your
> very thoughts. But in Python, the code should read like pseudo-code.
> Python *is* pseudo-code that compiles, after all.
>
> You don't need many classes because the branching logic--the bits or
> the program that say, "I'm filtering by days and not months"--can be
> contained in one bigger function that calls a very general sub-
> function. There's no need to abstract that bit out into a class. No
> one is going to use it but the match routine. Just write the code that
> does it and be done with it.
>
> In fact, by writing classes for all these branches in the program
> logic, you are doing yourself a disservice. When you return to this
> code 3 weeks from now, you'll find all the class declarations and
> metaclass and syntactic sugar is getting in your way of seeing what is
> really happening. That is always bad and should be avoided, just like
> flowery language and useless decorum should be avoided.

As for the metaclass -- yeah, quite possibly; that was more of a
"just for fun" experiment than anything else.  Hence the Don Quixote
reference; attacking imaginary enemies and all that ;).  But as for
the rest of this statement, I thoroughly disagree.  This
object-oriented "syntactic sugar" is, in this case, a means of
organizing my application's behavior into meaningful, simply
understood, and easily adapted units.  A monolithic "match" function
with an ever-increasing number of arguments as I cram in more and
more classes of matching logic?  *That* is what's bound to become
incomprehensible.

> No, you're arguing about the thing I wanted to argue about, so you see
> the point I am trying to make.
>
> It's painful to realize that all those years learning OO design and
> design patterns just to make Java usable are wasted in the world of
> Python. I understand that because I've invested years mastering C++
> and perl before discovering Python. Your solace comes when you embrace
> that and see how much simpler life really is when the language gets
> out of your way.

It's just as harmful as to ignore the occasional usefulness of
object-oriented patterns as it is to abuse them left and right.  My
approach on this project feels right, it produces legible and easily
extensible code, it's been a breeze to test and maintain so far...
if implementing different types of matching logic as classes here is
"wrong" by Python convention (and that would come as a rather big
surprise to me), then I don't want to be right.

-- 
Mark Shroyer
http://markshroyer.com/



More information about the Python-list mailing list