how to match whole word

John S jstrickler at gmail.com
Wed Jul 16 16:59:51 EDT 2008


On Jul 16, 9:38 am, Peng Yu <PengYu... at gmail.com> wrote:
> On Jul 15, 10:29 pm, Gary Herron <gher... at islandtraining.com> wrote:
>
>
>
> > Peng Yu wrote:
> > > Hi,
>
> > > The following code snippet is from /usr/bin/rpl. I would like the it
> > > to match a word, for example, "abc" in ":abc:". But the current one
> > > would not match "abc" in ":abc:". I tried to modify it myself. Would
> > > you please let me know what is the corrected way to do it?
>
> > > Thanks,
> > > Peng
>
> > >    if opts.whole_words:
> > >            regex = re.compile(r"(?:(?<=\s)|^)" + re.escape(old_str) + r"(?=\s|
> > > $)",
> > >                                               opts.ignore_case and re.I or 0)
> > > --
> > >http://mail.python.org/mailman/listinfo/python-list
>
> > The regular expression "\w+" will match (what might be your definition
> > of) a word, and in particular will match abc in :abc:.   Regular
> > expressions have lots of other special \-sequences that might be worth
> > your while to read about:  http://docs.python.org/lib/re-syntax.html
>
> > Gary Herron
>
> I didn't read the docs and tried the following code.
>
> regex = re.compile(r"\A" + re.escape(old_str) + r"\Z",
> opts.ignore_case and re.I or 0)
>
> But I'm not sure why it is not working.
>
> Thanks,
> Peng

Not sure why you picked \A and \Z -- they are only useful if you are
using the re.M flag.
What you want is \b -- match word boundary, on either side of your
word:

regex = re.compile(r"\b" + re.escape(old_str) + r"\b",re.I)

re.I is the same as re.IGNORECASE. More than one option may be OR'ed
together. There's no such thing as "re.O" in Python.  I can understand
where you get the idea, as there is an 'o' modifier for REs in Perl.

To summarize, \A and \Z match the beginning and end of a STRING, while
\b matches the beginning or end of a WORD.

-- john



More information about the Python-list mailing list