simple (?) re question

Andrew Kuchling akuchlin at mems-exchange.org
Tue Jul 11 13:59:34 EDT 2000


Frederic Laurent <laurent8 at sxb.bsf.alcatel.fr> writes:
> Hi, I try to have a regular expression that doesn't match if
> a filename is composed of "SCCS", so I have
> >>> r=re.compile(".*(?!SCCS.*$).*[.]java$")
> >>> r.match("/foo/bar/SCCS/s.file.java")
> <re.MatchObject instance at 1f9898>
> But it's wrong for me ! I was expected this file will not match.

You're using the wrong tool; regexes are very clumsy for this sort of
thing.  Try looking at your regex using Demo/tkinter/guido/redemo.py
to see how it matches; the first .* matches '/foo/bar/SCCS/s.file',
the ?! expression matches nothing, and the second .* matches nothing.

> I've got the same problem with an expression wich mustn't match
> something that's ending with a java extension
> >>> r=re.compile(".*[.](?!java$).*$")

Similar problem; the regex engine finds an alternative match.  Why not
just write:

path, filename = os.path.split( filename )
if 'SCCS' in string.split(path, os.pathsep):
    # SCCS in path

for the first case, and:

root, extension = os.path.splitext(filename)
if extension == '.java':
    # Filename ends in .java

for the second?  Much clearer than a complicated regex which has to be
analyzed to be understood, and it makes it more apparent that you're
making a decision based on the path.

--amk



More information about the Python-list mailing list