[Spambayes] It gets funnier all the time....
Neil Schemenauer
nas at python.ca
Wed Feb 12 09:21:29 EST 2003
Tim Stone - Four Stones Expressions wrote:
> I doubt that the tokenizer would generate any meaningful tokens from this
> message. Generating a token would be the right way to do it, any ideas how?
Something like:
import string
import re
BASE64_CHARSET = string.ascii_letters + string.digits + "+/"
valid_base64 = re.compile('[%s]$' % BASE64_CHARSET).match
def tokenize_word(...):
...
elif 60 <= n <= 76 and valid_base64(word):
yield 'bare base64'
I don't know if 60 is reasonable as a lower bound. Does someone want to
test Outlook? Maybe it only magically detects base-64 if the line is
exactly 76 characters long.
Neil
More information about the Spambayes
mailing list