[Tutor] Regex confusion
Jeffrey N. Shelton
shelton@custommachine.com
Fri, 17 Dec 1999 10:52:29 -0500
Michael P. Reilly wrote:
> [Charset Windows-1252 unsupported, skipping...]
> > I'm looking at some Python code (from the DT_HTML.py module of Zope,
> > actually) that uses a regex expression that I can't figure out. Perhaps
you
> > could point me in the right direction?
> >
> > name_match=regex.compile('[\0- ]*[a-zA-Z]+[\0- ]*').match
> > end_match=regex.compile('[\0- ]*\(/\|end\)',regex.casefold).match
> > start_search=regex.compile('[<&]').search
> >
> > The phrase "[\0- ]" in the first two lines confuses me. Is this a group
> > reference? And if so, what is group 0? I thought the numbering for match
> > groups started at 1. Logically, it would seem that it this phrase is a
means
> > for grabbing up leading and trailing dashes and whitespace. But it's
beyond
> > me to figure out how "\0" figures into this.
> >
> > Also, the ampersand (&) in the third line is a problem for me. I've
looked
> > through my copy of "Mastering Regular Expressions" and can't find any
> > reference to a "&" metacharacter. What am I overlooking?
>
> Hi Jeffrey,
>
> The regular expression [...] is commonly called a character class, it
> matches any one character against the characters inside the brackets.
> [<&] - one of the two characters "<" or "&"
> [\0- ] - any character with ASCII value between 0 ('\0') and 32 (' ')
>
> If you include a caret (^) immediately after the left bracket ([), then
> matching is against characters not in the class.
>
> -Arcege
Thanks for the help!
The "&" usage is now obvious in the light of day. (Doh!) As is the "\0- "
phrase, although I think I would have caught on a lot faster if it had been
"\0-\32" or, even better, "\000-\032".
Anyhow, I'm back on my feet again. Thanks!
Jeff