[Spambayes] mining dates?
Tim Peters
tim.one@comcast.net
Tue, 01 Oct 2002 01:36:34 -0400
[Skip Montanaro, on day-of-week]
> Yeah, I thought about dow. I'll give it a look-see. Of course, that
> requires me to actually call time.strptime() and come up with a couple
> plausible format strings.
Stupid almost certainly beats smart here. Match against
r'(Mon|Tue|Wed|Thu|Fri|Sat|Sun),\s'
If that succeeds, generate a dow token with the day of the week, else
generate a dow token with a "no day" value. All cases are then reduced to
8, and all goofy patterns you see in spam are reduced to one. You could
refine that a little (e.g., to distinguish plain-missing from there-but-
not-followed-by-space), but I expect more than that would be
counterproductive. Testing is the final judge, of course, but trust me on
this one <wink>: start stupid, and work your way up until results stop
improving.