Script for finding words of any size that do NOT contain vowels with acute diacritic marks?
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Wed Oct 17 11:32:52 EDT 2012
Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :
> On 10/17/2012 10:31 AM, nwaits wrote:
>
> > I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
>
> > Thank you.
>
>
>
> if you can construct a list of "illegal" characters, then you can simply
>
> check each character of the word against the list, and if it succeeds
>
> for all of the characters, it's a winner.
>
>
>
> If that's not fast enough, you can build a translation table from the
>
> list of illegal characters, and use translate on each word. Then it
>
> becomes a question of checking if the translated word is all zeroes.
>
> More setup time, but much faster looping for each word.
>
>
>
> --
>
>
>
> DaveA
Lazy way.
Py3.2
>>> import unicodedata
>>> def HasDiacritics(w):
... w_decomposed = unicodedata.normalize('NFKD', w)
... return 'no' if len(w) == len(w_decomposed) else 'yes'
...
>>> HasDiacritics('éléphant')
'yes'
>>> HasDiacritics('elephant')
'no'
>>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')
'yes'
>>> HasDiacritics('U')
'no'
>>>
Should be ok for the CombiningDiacriticalMarks unicode range
(common diacritics)
jmf
More information about the Python-list
mailing list