regex walktrough
MRAB
python at mrabarnett.plus.com
Sat Dec 8 13:08:36 EST 2012
On 2012-12-08 17:48, rh wrote:
> Look through some code I found this and wondered about what it does:
> ^(?P<salsipuedes>[0-9A-Za-z-_.//]+)$
>
> Here's my walk through:
>
> 1) ^ match at start of string
> 2) ?P<salsipuedes> if a match is found it will be accessible in a variable
> salsipuedes
> 3) [0-9A-Za-z-_.//] this is the one that looks wrong to me, see below
> 4) + one or more from the preceeding char class
> 5) () the grouping we want returned (see #2)
> 6) $ end of the string to match against but before any newline
>
>
> more on #3
> the z-_ part looks wrong and seems that the - should be at the start
> of the char set otherwise we get another range z-_ or does the a-z
> preceeding the z-_ negate the z-_ from becoming a range? The "."
> might be ok inside a char set. The two slashes look wrong but maybe
> it has some special meaning in some case? I think only one slash is
> needed.
>
> I've looked at pydoc re, but it's cursory.
>
Python itself will help you:
>>> re.compile(r"^(?P<salsipuedes>[0-9A-Za-z-_.//]+)$", flags=re.DEBUG)
at at_beginning
subpattern 1
max_repeat 1 65535
in
range (48, 57)
range (65, 90)
range (97, 122)
literal 45
literal 95
literal 46
literal 47
literal 47
at at_end
Inside the character set: "0-9", "A-Z" and "a-z" are ranges; "-", "_",
"." and "/" are literals. Doubling the "/" is unnecessary (it has no
special meaning). "-" is a literal because it immediately follows a
range, so it can't be defining another range (if it immediately
followed a literal and wasn't immediately followed by an unescaped "]"
then it would, so r"[a-]" is the same as r"[a\-]").
As for "(?P<salsipuedes>...)", it won't be accessible in a variable
"salsipuedes", but will be accessible as a named group in the match
object:
>>> m = re.match(r"(?P<foo>[a-z]+)", "xyz")
>>> m.group("foo")
'xyz'
More information about the Python-list
mailing list