re

David C. Ullrich dullrich at sprynet.com
Wed Jun 4 13:24:33 EDT 2008


In article <6anvi4F38ei08U1 at mid.uni-berlin.de>,
 "Diez B. Roggisch" <deets at nospam.web.de> wrote:

> David C. Ullrich schrieb:
> > Actually using regular expressions for the first
> > time. Is there something that allows you to take the
> > union of two character sets, or append a character to
> > a character set?
> > 
> > Say I want to replace 'disc' with 'disk', but only
> > when 'disc' is a complete word (don't want to change
> > 'discuss' to 'diskuss'.) The following seems almost
> > right:
> > 
> >   [^a-zA-Z])disc[^a-zA-Z]
> > 
> > The problem is that that doesn't match if 'disc' is at
> > the start or end of the string. Of course I could just
> > combine a few re's with |, but it seems like there should
> > (or might?) be a way to simply append a \A to the first
> > [^a-zA-Z] and a \Z to the second.
> 
> Why not
> 
> ($|[\w])disc(^|[^\w])
> 
> I hope \w is really the literal for whitespace - might be something 
> different, see the docs.

Thanks, but I don't follow that at all.

Whitespace is actually \s. But [\s]disc[whatever]
doesn't do the job - then it won't match "(disc)",
which counts as "disc appearing as a full word.

Also I think you have ^ and $ backwards, and there's
a ^ I don't understand. I _think_ that a correct version
of what you're suggesting would be

(^|[^a-zA-Z])disc($|[^a-zA-Z])

But as far as I can see that simply doesn't work.
I haven't been able to use | that way, combining
_parts_ of a re. That was the first thing I tried.
The original works right except for not matching
at the start or end of a string, the thing with
the | doesn't work at all:

>>> test = compile(r'(^|[^a-zA-Z])disc($|[^a-zA-Z])')
>>> test.findall('')
[]
>>> test.findall('disc')
[('', '')]
>>> test.findall(' disc ')
[(' ', ' ')]
>>> disc = compile(r'[^a-zA-Z]disc[^a-zA-Z]')
>>> disc.findall(' disc disc disc')
[' disc ']
>>> disc.findall(' disc  disc disc')
[' disc ', ' disc ']
>>> test.findall(' disc  disc disc')
[(' ', ' '), (' ', ' ')]
>>> disc.findall(' disc  disc  disc')
[' disc ', ' disc ']
>>> disc.findall(' disc  disc  disc ')
[' disc ', ' disc ', ' disc ']


> Diez

-- 
David C. Ullrich



More information about the Python-list mailing list