returning regex matches as lists

Jonathan Lukens jonathan.lukens at gmail.com
Sat Feb 16 07:27:56 EST 2008


John,

> (1) raw string for improved legibility
> ru'(?u)\b([á-ñ]{2,}\s+)([<<"][Á-Ñá-ñ]+)(\s*-?[Á-Ñá-ñ]+)*([>>"])'

This actually escaped my notice after I had posted -- the letters with
diacritics are incorrectly decoded Cyrillic letters -- I suppose I
code use the Unicode escape sequences (the sets [á-ñ] and [Á-Ñá-ñ] are
the Cyrillic equivalents of [a-z] and [A-Za-z]) but then suddenly the
legibility goes out the window again.

> (3) what appears between [] is a set of characters, so [<<"] is the
> same as [<"] and probably isn't doing what you expect; have you tested
> this regex for correctness?

These were angled quotation marks in the original Unicode.  Sorry
again.    The regex matches everything it is supposed to.  The extra
parentheses were because I had somehow missed the .group method and it
had only been returning what was only in the one needed set of
parentheses.

> I can't imagine how "not a programmer" implies "interested to know if
> there is a more elegant way".

More carefully stated: "I am self-taught have no real training or
experience as a programmer and would be interested in seeing how a
programmer with training
and experience would go about this."

Thank you,
Jonathan



More information about the Python-list mailing list