Python based unacceptable language filter

David Pratt fairwinds at eastlink.ca
Mon Oct 3 08:46:08 EDT 2005


Hi. Thank you for the links. I am looking for something that would  
function in a similar way to Yahoo's filter for it's message boards.  
Perhaps I should have used the term profanity instead of unacceptable  
language. I am not concerned about correcting sentence structure or  
poor grammar.

I realize screening profanity can be accomplished by simply looping  
over regular expressions from a database or dictionary to search the  
text to check against possibilities .  I thought it possible that there  
may be something like this already in existence, perhaps already in a  
module since it is likely (despite how absurd) - that someone has  
developed a dictionary of profane word expressions I suspect. What's is  
perhaps more crazy, is that one has to consider including something  
like this in an application - but you have to conclude the Internet is  
what it is.

Regards
David

 From Yahoo:
"The Profanity Filter allows you to control how you want to view  
messages with profanity in two ways. You can choose to view the  
messages with the profanity masked with italcized symbols (@$&% ), or  
you can have the messages containing profanity hidden entirely.

You can also choose between a weak setting for exact word matches or a  
strong setting that will filter spelling variations."

Well I know this thread is a

On Sunday, October 2, 2005, at 10:45 PM, Nigel Rowe wrote:

> David Pratt wrote:
>
>> Hi.  Is anyone aware of any python based unacceptable language filter
>> code to scan and detect bad language in text from uploads etc.
>>
>> Many thanks.
>> David
>
> You might be able to adapt languagetool.
> http://www.danielnaber.de/languagetool/features.html
>
> Later versions have been ported to Java, but the old python version of
> languagetool is at http://tkltrans.sourceforge.net/#r03
>
> His thesis paper is at
> http://www.danielnaber.de/languagetool/download/ 
> style_and_grammar_checker.pdf
>
> Mind you, given the poor language skills of many native english  
> speakers
> (not to mention those for whom english is a second language) relying on
> automated filters to enforce 'good' language seems a trifle extreme.   
> This
> post for example would probably not pass.



More information about the Python-list mailing list