[Spambayes] date for new release to handle image spam?
Mark Hammond
mhammond at skippinet.com.au
Mon Feb 5 01:33:51 CET 2007
> If you run ocrad over some spam text images you can see what
> it generates.
> If it finds nothing, nothing comes out the back end. If it
> sees something,
> it's almost certain to be some garbage text peculiar to it,
> unlikely to turn
> up in normal text. For example, here's a pretty clean image:
>
> http://www.webfast.com/~skip/bogus-5-3.png
>
> Here's what ocrad produces by default:
>
> COULD THl_ BE THE NEXT IBM_
> ALL _|___ _wow IWAl LllL |_ ABO_| lo EXPLODEl
> WAIIW LllL p_ Ll_E A WAW_ _IARll__ WO_DA_ _EPIEWBER lll
>
> IomO_n_ __m_ L |_IL IOWP_IER_ |_I (o_h__ OII LllL p_)
> __o__ __mbol LllL
> F_ld__ Ilo__ O Tl (_o s_/_ On F_ld__ Alon_|)
> _ d__ |__o__ __
> I____n_ R__lnO ___onO B__
> \
> ln _h_ Io____ ot _ W___. LllL W____ ______| ___nnlnO Wo___'
>
> L ln___n__lon_| Anno_n___
>
> On_lo__h(IW) _P_o_P__ TP_hnoloO_ b_
> B_llP_ p_oo_ Da_a _P___|__ Ba_k_O_ and _P__o_P_
> |__ ____ __n____lon p__Aqco_TM_/P__AID CO_TM_
> _|__a Po__ablP wloh _OPPd _olld __a_P D_|_P TP_hnoloO_
> _h_ W___oOoll_. _hP Wo_ld _ _|___ _g laO_oO ComOrfP_
> _Pa___lnO W_ldla _ Q_a_ll TP_hnoloO_
> \
> L ln___n__lon_| _IOn_ _4 _W E__oO__n Dl___lb__lon AO___m_n_
>
> Th_ b_Pmo__ __PO b_wa_d _a__|_al _Pn___P |_ amonO o_hP_ p__|__|_P
> dl___lb__lon aO_PPmPn__ ____Pn_|_ _ndP_ nPOo_la_lon ª_
> _P_P_al addl_lonal
> hlOh O_ofi_ _POlon_ and _PO_P_Pn__ a kP_ ___a_POl_
> Oa__nP__hlO _ha_ _P___P_
> l ln_P_na_lonal ComO__P__ wl_h ___|_ Olobal ma_kP_ _Pa_h
> and O_a_an_PPd
> O_P _alP_ and lo_k_ _hP _omOan_ ln hlOhl_ dP_|_ablP
> p__|__|_P dl___lb__lon
> ma_kP__
>
> READ MORE ONLINE NOWl
>
> OPPORl__||_ DOE_ _ol __OI_ o_ IWE DOOR E_ER_ DA_|
> _o _A_E A Wl__IE IOODD LllL lo _O_R RADAR _ow A_D
> WAIIW II _OARl
FWIW, I am getting *much* better results with gocr than ocrad. gocr running
over that same image results in:
--- 8< ---
_ _ _ _
COULD THIS BE THE NEXT IBM?
ALL SIGNS SHOW THAT LITL IS ABOUT TO EXPLODE!
Company Name:
Stock Symbol:
Friday Close: O.71 (Up 6O_a On Friday Alone!)
S-dayTarget: $3
Current Rating: Strong Buy
\
In the Course of a Week, LITL Makes Several Stunning Moves!
L International Announces:
- OneTouch(TM) Recovery Technology hr
Bullet-Proof Data Security Backups and Restores ,
- Its Next-Generation PuRA_GO(TM)/PuRAID-GO(TM)
UItra-Portable High-Speed Solid State Drive Technology
. - the metropolis, the worldt First l9'' Laptop compWer
Featuring Nvidiat Quad-SLI Technology _
\
L International Signs $4SM European Distribution Agreement
- T_s hremost step hrward tactical venture is, among other exclusive
distribution agreements, currently under negotiation gr several additional
high-pro_t regions and represents a key strategic partnership that secures
L International Computers with truly global market reach and guaranteed
pre-sales, and locks the company in highly desirable exclusive distribution
marke.ts.
--- >8 ----
Indeed, I have never seen an image that ocrad does better on than gocr.
FWIW, I'm currently 1/2 way through modifying spambayes to support either
ocrad or gocr, in the hope that using gocr will actually cause a noticible
reduction in image spam - unfortunately, using gocr I see no reduction at
all (which isn't to say there is not a small reduction - it just doesn't
"seem" to me like it has reduced).
Mark
More information about the SpamBayes
mailing list