[Spambayes] It gets funnier all the time....

Neil Schemenauer nas at python.ca
Wed Feb 12 09:21:29 EST 2003


Tim Stone - Four Stones Expressions wrote:
> I doubt that the tokenizer would generate any meaningful tokens from this 
> message.  Generating a token would be the right way to do it, any ideas how?  

Something like:

    import string
    import re
    BASE64_CHARSET = string.ascii_letters + string.digits + "+/"
    valid_base64 = re.compile('[%s]$' % BASE64_CHARSET).match

    def tokenize_word(...):
        ...
        elif 60 <= n <= 76 and valid_base64(word):
            yield 'bare base64'

I don't know if 60 is reasonable as a lower bound.  Does someone want to
test Outlook?  Maybe it only magically detects base-64 if the line is
exactly 76 characters long.

  Neil



More information about the Spambayes mailing list