re

Wed Jun 4 17:39:38 EDT 2008

In article <mailman.79.1212598994.1044.python-list at python.org>,
 "Russell Blau" <russblau at hotmail.com> wrote:

> "Diez B. Roggisch" <deets at nospam.web.de> wrote in message 
> news:6anvi4F38ei08U1 at mid.uni-berlin.de...
> > David C. Ullrich schrieb:
> >> Say I want to replace 'disc' with 'disk', but only
> >> when 'disc' is a complete word (don't want to change
> >> 'discuss' to 'diskuss'.) The following seems almost
> >> right:
> >>
> >>   [^a-zA-Z])disc[^a-zA-Z]
> >>
> >> The problem is that that doesn't match if 'disc' is at
> >> the start or end of the string. Of course I could just
> >> combine a few re's with |, but it seems like there should
> >> (or might?) be a way to simply append a \A to the first
> >> [^a-zA-Z] and a \Z to the second.
> >
> > Why not
> >
> > ($|[\w])disc(^|[^\w])
> >
> > I hope \w is really the literal for whitespace - might be something 
> > different, see the docs.
> 
> No, \s is the literal for whitespace. 
> http://www.python.org/doc/current/lib/re-syntax.html
> 
> But how about:
> 
> text = re.sub(r"\bdisc\b", "disk", text_to_be_changed)
> 
> \b is the "word break" character, 

Lovely - that's exactly right, thanks. I swear I looked at the
docs... I'm just blind or stupid. No wait, I'm blind _and_
stupid. No, blind and stupid and slow...

Doesn't precisely fit the _spec_ because of digits and underscores,
but it's close enough to solve the problem exactly. Thanks.

>it matches at the beginning or end of any 
> "word" (where a word is any sequence of \w characters, and \w is any 
> alphanumeric
> character or _).
> 
> Note that this solution still doesn't catch "Disc" if it is capitalized.

Thanks. I didn't mention I wanted to catch both cases because I
already knew how to take care of that:

r"\b[dD]isc\b"

> Russ

-- 
David C. Ullrich