How to write this regular expression?

Jeremy Bowers jerf at jerf.org
Wed May 4 14:44:37 EDT 2005


On Wed, 04 May 2005 20:24:51 +0800, could ildg wrote:

> Thank you.
> 
> I just learned how to use re, so I want to find a way to settle it by
> using re. I know that split it into pieces will do it quickly.

I'll say this; you have two problems, splitting out the numbers and
verifying their conformance to some validity rule.

I strongly recommend treating those two problems separately. While I'm not
willing to guarantee that an RE can't be written for something like ("[A
number A]_[A number B]" such that A < B) in the general case, it won't be
anywhere near as clean or as easy to follow if you just write an RE to
extract the numbers, then verify the constraints in conventional Python.

In that case, if you know in advance that the numbers are guaranteed to be
in that format, I'd just use the regular expression "\d+", and the
"findall" method of the compile expression:

Python 2.3.5 (#1, Mar  3 2005, 17:32:12) 
[GCC 3.4.3  (Gentoo Linux 3.4.3, ssp-3.4.3-0, pie-8.7.6.6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> m = re.compile("\d+") 
>>> m.findall("344mmm555m1111")
['344', '555', '1111']
>>>

If you're checking general matching of the parameters you've given, I'd
feel no shame in checking the string against r"^(_\d+){1,3}$" with .match
and then using the above to get the numbers, if you prefer that. (Note
that I believe .match implies the initial ^, but I tend to write it
anyways as a good habit. Explicit better than implicit and all that.)

(I just tried to capture the three numbers by adding a parentheses set
around the \d+ but it only gives me the first. I've never tried that
before; is there a way to get it to give me all of them? I don't think so,
so two REs may be required after all.)



More information about the Python-list mailing list