[Tutor] Regex confusion

Jeffrey N. Shelton shelton@custommachine.com
Fri, 17 Dec 1999 10:52:29 -0500


Michael P. Reilly wrote:

> [Charset Windows-1252 unsupported, skipping...]
> > I'm looking at some Python code (from the DT_HTML.py module of Zope,
> > actually) that uses a regex expression that I can't figure out. Perhaps
you
> > could point me in the right direction?
> >
> >      name_match=regex.compile('[\0- ]*[a-zA-Z]+[\0- ]*').match
> >      end_match=regex.compile('[\0- ]*\(/\|end\)',regex.casefold).match
> >      start_search=regex.compile('[<&]').search
> >
> > The phrase "[\0- ]" in the first two lines confuses me. Is this a group
> > reference? And if so, what is group 0? I thought the numbering for match
> > groups started at 1. Logically, it would seem that it this phrase is a
means
> > for grabbing up leading and trailing dashes and whitespace. But it's
beyond
> > me to figure out how "\0" figures into this.
> >
> > Also, the ampersand (&) in the third line is a problem for me. I've
looked
> > through my copy of "Mastering Regular Expressions" and can't find any
> > reference to a "&" metacharacter. What am I overlooking?
>
> Hi Jeffrey,
>
> The regular expression [...] is commonly called a character class, it
> matches any one character against the characters inside the brackets.
>   [<&]    - one of the two characters "<" or "&"
>   [\0- ]  - any character with ASCII value between 0 ('\0') and 32 (' ')
>
> If you include a caret (^) immediately after the left bracket ([), then
> matching is against characters not in the class.
>
>   -Arcege

Thanks for the help!

The "&" usage is now obvious in the light of day. (Doh!)  As is the "\0- "
phrase, although I think I would have caught on a lot faster if it had been
"\0-\32" or, even better, "\000-\032".

Anyhow, I'm back on my feet again. Thanks!

Jeff