Python regular expressions just ain't PCRE

Wiseman Wiseman1024 at gmail.com
Sat May 5 21:57:45 EDT 2007


On May 5, 10:44 pm, John Machin <sjmac... at lexicon.net> wrote:
> "UTF-8 Unicode" is meaningless. Python has internal unicode string
> objects, with comprehensive support for converting to/from str (8-bit)
> string objects. The re module supports unicode patterns and strings.
> PCRE "supports" patterns and strings which are encoded in UTF-8. This
> is quite different, a kludge, incomparable. Operations which inspect/
> modify UTF-8-encoded data are of interest only to folk who are
> constrained to use a language which has nothing resembling a proper
> unicode datatype.

Sure, I know it's a mediocre support for Unicode for an application,
but we're not talking an application here. If I get the PCRE module
done, I'll just PyArg_ParseTuple(args, "et#", "utf-8", &str, &len),
which will be fine for Python's Unicode support and what PCRE does,
and I won't have to deal with this string at all so I couldn't care
less how it's encoded and if I have proper Unicode support in C or
not. (I'm unsure of how Pyrex or SWIG would treat this so I'll just
hand-craft it. It's not like it would be complex; most of the magic
will be pure C, dealing with PCRE's API.)

> There's also the YAGNI factor; most folk would restrict using regular
> expressions to simple grep-like functionality and data validation --
> e.g. re.match("[A-Z][A-Z]?[0-9]{6}[0-9A]$", idno). The few who want to
> recognise yet another little language tend to reach for parsers, using
> regular expressions only in the lexing phase.

Well, I find these features very useful. I've used a complex, LALR
parser to parse complex grammars, but I've solved many problems with
just the PCRE lib. Either way seeing nobody's interested on these
features, I'll see if I can expose PCRE to Python myself; it sounds
like the fairest solution because it doesn't even deal with the re
module - you can do whatever you want with it (though I'd rather have
it stay as it is or enhance it), and I'll still have PCRE. That's if I
find the time to do it though, even having no life.




More information about the Python-list mailing list