Python based unacceptable language filter

Andrew Gwozdziewycz apgwoz at gmail.com
Mon Oct 3 08:23:12 EDT 2005


On Oct 2, 2005, at 9:45 PM, Nigel Rowe wrote:

> David Pratt wrote:
>
>
>> Hi.  Is anyone aware of any python based unacceptable language filter
>> code to scan and detect bad language in text from uploads etc.
>>
>> Many thanks.
>> David
>>
>
> You might be able to adapt languagetool.
> http://www.danielnaber.de/languagetool/features.html
>
> Later versions have been ported to Java, but the old python version of
> languagetool is at http://tkltrans.sourceforge.net/#r03
>
> His thesis paper is at
> http://www.danielnaber.de/languagetool/download/ 
> style_and_grammar_checker.pdf
>
> Mind you, given the poor language skills of many native english  
> speakers
> (not to mention those for whom english is a second language)  
> relying on
> automated filters to enforce 'good' language seems a trifle  
> extreme.  This
> post for example would probably not pass.
>
> Cheers,
>         Nigel
>
> PS. For the humour impaired, this g*d d*mm post was a f*cking joke,  
> OK! :-)
>
> Mind you, the links are real.
>
> -- 
>         Nigel Rowe
>         A pox upon the spammers that make me write my address like..
>                 rho (snail) swiftdsl (stop) com (stop) au
> -- 
> http://mail.python.org/mailman/listinfo/python-list
>



I think he may be referring to "bad" words, and 'filthy' language. At  
least that's what i got from the question.
There are many PHP implementations on the web, which could be adapted  
to python fairly easily. Most of which are probably not the most  
ideal solution and
involve alot of stuff like

       for n in badwords:
             texttofilter.replace(n, '<bad word deleted>')

If that's all you need though, maybe it's not so bad.


---
Andrew Gwozdziewycz
apgwoz at gmail.com
http://ihadagreatview.org
http://plasticandroid.org





More information about the Python-list mailing list