Python based unacceptable language filter
David Pratt
fairwinds at eastlink.ca
Mon Oct 3 08:46:08 EDT 2005
Hi. Thank you for the links. I am looking for something that would
function in a similar way to Yahoo's filter for it's message boards.
Perhaps I should have used the term profanity instead of unacceptable
language. I am not concerned about correcting sentence structure or
poor grammar.
I realize screening profanity can be accomplished by simply looping
over regular expressions from a database or dictionary to search the
text to check against possibilities . I thought it possible that there
may be something like this already in existence, perhaps already in a
module since it is likely (despite how absurd) - that someone has
developed a dictionary of profane word expressions I suspect. What's is
perhaps more crazy, is that one has to consider including something
like this in an application - but you have to conclude the Internet is
what it is.
Regards
David
From Yahoo:
"The Profanity Filter allows you to control how you want to view
messages with profanity in two ways. You can choose to view the
messages with the profanity masked with italcized symbols (@$&% ), or
you can have the messages containing profanity hidden entirely.
You can also choose between a weak setting for exact word matches or a
strong setting that will filter spelling variations."
Well I know this thread is a
On Sunday, October 2, 2005, at 10:45 PM, Nigel Rowe wrote:
> David Pratt wrote:
>
>> Hi. Is anyone aware of any python based unacceptable language filter
>> code to scan and detect bad language in text from uploads etc.
>>
>> Many thanks.
>> David
>
> You might be able to adapt languagetool.
> http://www.danielnaber.de/languagetool/features.html
>
> Later versions have been ported to Java, but the old python version of
> languagetool is at http://tkltrans.sourceforge.net/#r03
>
> His thesis paper is at
> http://www.danielnaber.de/languagetool/download/
> style_and_grammar_checker.pdf
>
> Mind you, given the poor language skills of many native english
> speakers
> (not to mention those for whom english is a second language) relying on
> automated filters to enforce 'good' language seems a trifle extreme.
> This
> post for example would probably not pass.
More information about the Python-list
mailing list