Bayesian kids content filtering in Python?

John J. Lee jjl at pobox.com
Fri Aug 29 20:26:59 EDT 2003


jjl at pobox.com (John J. Lee) writes:

> "Paul Paterson" <paulpaterson at users.sourceforge.net> writes:
[...]
> censor, and you're not going to block them all.  It may work well most
> of the time, but is that enough?  What's needed here, perhaps, is an
> open effort to train on categories of things that people would like to
> block.  That might be enough, since I suppose *most* things you're
[...]

Thinking a bit more, that might well fail.  It assumes that our
high-level categories of things we want to block line up in a simple
way with the workings of the algorithm, which is very doubtful when
the set of things to filter is no longer highly restricted as in the
email case.  Though some censorship targets probably *are* spammishly
predictable and unimaginative, no doubt lots aren't, too.

I'm reminded of the experience of a miltary research project using
neural networks to recognise tanks in aerial photographs.  They got
someone to go and take photos of tanks and other large tank-like
objects partially hidden in forested terrain, and trained their
network on a fraction of the photos.  When they tested the network on
the rest of the photographs, they were delighted to discover that it
performed fantastically well, despite the great variability in the
appearance of the objects and terrain, distinguishing tank from
non-tank almost perfectly.  Inexplicably, though, when they fed a very
similar set of photos to same network, it failed miserably.  It turned
out that what the network had *really* trained on was not at all what
they'd assumed.  To take the photos, they'd gone out and scattered the
real tanks over the landscape, and taken a set of tank photos.  Then
they'd moved the tanks out and put the mock-tanks in, and taken a set
of mock-tank photos.  Of course, that meant all the tank-photos were
taken in bright light, and all the mock-tank photos were in dim light
of late afternoon.  The neural network sensibly picked up that the
easy way to tell the two apart was just to look at how bright they
were!

I made the details of that story up, but who cares ;-)


John




More information about the Python-list mailing list