[Spambayes] mining dates?

Tim Peters tim.one@comcast.net
Tue, 01 Oct 2002 01:36:34 -0400


[Skip Montanaro, on day-of-week]
> Yeah, I thought about dow.  I'll give it a look-see.  Of course, that
> requires me to actually call time.strptime() and come up with a couple
> plausible format strings.

Stupid almost certainly beats smart here.  Match against

    r'(Mon|Tue|Wed|Thu|Fri|Sat|Sun),\s'

If that succeeds, generate a dow token with the day of the week, else
generate a dow token with a "no day" value.  All cases are then reduced to
8, and all goofy patterns you see in spam are reduced to one.  You could
refine that a little (e.g., to distinguish plain-missing from there-but-
not-followed-by-space), but I expect more than that would be
counterproductive.  Testing is the final judge, of course, but trust me on
this one <wink>:  start stupid, and work your way up until results stop
improving.