[spambayes-bugs] [ spambayes-Bugs-1166146 ] Tokenizer fails on bad URL

SourceForge.net noreply at sourceforge.net
Tue Mar 29 07:29:45 CEST 2005


Bugs item #1166146, was opened at 2005-03-19 07:27
Message generated for change (Comment added) made by anadelonbrin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1166146&group_id=61702

Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Leonid (leobru)
Assigned to: Tony Meyer (anadelonbrin)
Summary: Tokenizer fails on bad URL

Initial Comment:
The following line in the body of a message being
scored or trained

http://)

causes spambayes to die.  The potentially relevant
spambayesrc settings are

[Tokenizer]
x-fancy_url_recognition=True
x-pick_apart_urls=True

[URLRetriever]
x-slurp_urls=True

The relevant backtrace is

  File
"/usr/home/leob/spambayes-2/spambayes/classifier.py",
line 374, in _add_msg
    for word in Set(wordstream):
  File "/usr/local/lib/python2.3/sets.py", line 429, in
__init__
    self._update(iterable)
  File "/usr/local/lib/python2.3/sets.py", line 383, in
_update
    for element in iterable:
  File
"/usr/home/leob/spambayes-2/spambayes/classifier.py",
line 762, in _add_slurped
    slurped_tokens = self._generate_slurp()
  File
"/usr/home/leob/spambayes-2/spambayes/classifier.py",
line 556, in _generate_slurp
    tokens = self.slurp(*slurp_wordstream)
  File
"/usr/home/leob/spambayes-2/spambayes/classifier.py",
line 663, in slurp
    domain = mo.group(1)
AttributeError: 'NoneType' object has no attribute 'group'


----------------------------------------------------------------------

>Comment By: Tony Meyer (anadelonbrin)
Date: 2005-03-29 17:29

Message:
Logged In: YES 
user_id=552329

Ah - turns out that the -o command line option doesn't work
with the slurp options (options change is made too late), so
I wasn't actually turning slurping on before, which is why I
couldn't duplicate this.

I've identified the cause of this (I haven't checked, but
http://: and various others should also trigger it), and
checked in a fix.

Thanks!

----------------------------------------------------------------------

Comment By: Leonid (leobru)
Date: 2005-03-29 16:46

Message:
Logged In: YES 
user_id=790676

NB: 

http://(   

with the OPENING parenthesis works ok, it's the CLOSING one
that causes the error.



----------------------------------------------------------------------

Comment By: Leonid (leobru)
Date: 2005-03-29 16:45

Message:
Logged In: YES 
user_id=790676

I've downloaded version 1.0.4 and did the following:

% sb_filter.py -d dummy -n
% sb_filter.py -d dummy


http://)

^D

And got:


Traceback (most recent call last):
  File "./sb_filter.py", line 257, in ?
    main()
  File "./sb_filter.py", line 248, in main
    action(msg)
  File "./sb_filter.py", line 180, in filter
    return self.h.filter(msg)
  File
"/usr/home/leob/spambayes-1.0.4/scripts/spambayes/hammie.py",
line 109, in filter
    prob, clues = self._scoremsg(msg, True)
  File
"/usr/home/leob/spambayes-1.0.4/scripts/spambayes/hammie.py",
line 38, in _scoremsg
    return self.bayes.spamprob(tokenize(msg), evidence)
  File
"/usr/home/leob/spambayes-1.0.4/scripts/spambayes/classifier.py",
line 246, in slurping_spamprob
    slurp_tokens = list(self._generate_slurp())
  File
"/usr/home/leob/spambayes-1.0.4/scripts/spambayes/classifier.py",
line 559, in _generate_slurp
    tokens = self.slurp(*slurp_wordstream)
  File
"/usr/home/leob/spambayes-1.0.4/scripts/spambayes/classifier.py",
line 689, in slurp
    domain = mo.group(1)
AttributeError: 'NoneType' object has no attribute 'group'


----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2005-03-23 17:57

Message:
Logged In: YES 
user_id=552329

The problem is with the x-slurp_urls option from the trace,
but I can't duplicate this by adding in "http://)" to a
message.  Do you have a message that triggers this problem
that you can attach here for me to test against?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1166146&group_id=61702


More information about the Spambayes-bugs mailing list