[Spambayes] Missing HTML payload
Mark Hammond
mhammond at skippinet.com.au
Tue Mar 4 12:12:35 EST 2003
I wrote:
> I instrumented the "show clues" feature to show *all* message tokens found
> in the body. As you can see at the very end, the entire body was
> stripped.
I finally worked out where my missing "url:" tokens got to. However, once
that is corrected, the same problem remains - no tokens extracted from the
HTML body, *except* URL tokens, appear.
> I am guessing that we barf on:
> <td><!--#rotato>
> a comment which is never closed. Outlook actually shows this entire tag
Digging deeper, this seems to be true.
>>> from spambayes import tokenizer
>>> tokenizer.crack_html_comment("hi <!-- wow --> there")
('hi there', [])
>>> tokenizer.crack_html_comment("hi <!-- wow> there")
('hi ', [])
Mark.
More information about the Spambayes
mailing list