Regular expression for matching IPA characters in Unicode?
Michael Hoffman
m.h.3.9.1.without.dots.at.cam.ac.uk at example.com
Mon Oct 11 09:08:16 EDT 2004
Mickel Grönroos wrote:
> Which is the best way of checking that a given unicode string only
> contains IPA characters, e.g. characters in the range \u0250-\u02AF?
Well, I'll give you an example that only includes characters in the
range [\u0250, \u02AF] but those are just the IPA *extensions.* You also
need to include basic latin and greek characters from other blocks.
See: http://www.unicode.org/charts/PDF/U0250.pdf
And why do you want to do this anyway?
This example uses the itertools example all() which tells you whether a
predicate is true for every item in an iterable. The predicate here is
whether the item is contained in IPA_CHARS, which you can expand...
=====
import itertools
from sets import Set # set() is a built-in in 2.4
IPA_CHARS = Set(map(unichr, xrange(0x250, 0x2b0)))
def all(seq, pred=bool):
# http://www.python.org/doc/current/lib/itertools-example.html
"Returns True if pred(x) is True for every element in the iterable"
return False not in itertools.imap(pred, seq)
def is_ipa(iterable):
return all(iterable, IPA_CHARS.__contains__)
print is_ipa(u"aeiou") # this is valid IPA, but not in the extensions block
print is_ipa(u"\u0260\u02af") # valid IPA in the extensions block
====output===
False
True
--
Michael Hoffman
More information about the Python-list
mailing list