Regular Expression Question

Kragen Sitaker kragen at dnaco.net
Tue Apr 3 20:03:25 EDT 2001


In article <3aca5b94_2 at news.nwlink.com>,
Wesley Witt <wesw at wittfamily.com> wrote:
>This is probably a simple question, but I can't seem to find the answer
>anywhere.
>
>I want a regular expression that will match ALL lines that do NOT contain
>the string "skip".  I have some backup logs that I need to filter the noise
>out of.

Don't do this.  Your successor maintainers will curse you, your boss
will fire you, and your dog will pee on you.

Just say:

for line in file.getlines():
    if string.find(line, 'skip') == -1: 
        outfile.write(line)

But if you're curious:

You can match a line not containing 's' simply: re.compile("^[^s]*$").

You can match a line not containing 'sk' with more difficulty:
re.compile("^([^s]|s+[^sk])*$")

'ski' is a little harder; I think there's an easier way to do this, but
I don't know what it is:
re.compile("^([^s]|(s(ks)*)+([^sk]|k[^is]))*$")

(I think there's an easier way because the above RE is not strictly
deterministic --- it has to push two states when it sees 'sk', one
for k's followed by s and one followed by [^is].)

All of these REs have a bug: if a prefix of the evil sequence occurs at
the end of a line, they fail.  I'm not sure how to fix that, and I
don't want to extend it to 'skip'.
-- 
<kragen at pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
Perilous to all of us are the devices of an art deeper than we possess
ourselves.
       -- Gandalf the White [J.R.R. Tolkien, "The Two Towers", Bk 3, Ch. XI]




More information about the Python-list mailing list