Regular Expression - newbie question

Eddie Corns eddie at holyrood.ed.ac.uk
Wed Aug 28 13:53:08 EDT 2002


skunix at hotmail.com (SK) writes:

>I want to search for a pattern across multiple lines:-
>  
>Pattern: "python output" followed by anything without "python output"
>again and then "Hello World"

>Input File (c.txt)
>----------
>This is my file
>This is one python output
>This is my file
>This is two python output
>This is my file
>Hello World

>The following code snippet matches only 
>  This is one python output
>  Hello World

>But I am interested only in the following:-
>   This is two python output
>   Hello World


>Code Snippet
>============
>import re
>data = open("c.txt","rb").read()
>regexp = "(^.*python output.*$)[\000-\377]*?(^.*Hello World.*$)"
>r = re.compile(regexp,re.M)
>match = re.findall(r, data)
>print match

>Desired Output Match
>====================
>This is two python output
>Hello World

>Any good pointers/books for regular expressions in Python appreciated.

I'm fairly sure you can't do this with a regular expression.  If I understand
the theory correctly you can only apply NOT to a limited class of nodes in the
expression (and even those are ad-hoc extensions added for usefulness).
Of course the easiest way is just to iterate over all the lines like so:

import re

last_op = None
for line in open('c.txt').readlines():
    if re.search (r'python output',line):
        last_op = line
    if re.search (r'Hello World', line) and last_op:
        print last_op,
        print line,

but you knew that :)

Anyway, there was a thread discussing the 2nd edition of 'Mastering Regular
Expresssions' on this group recently - IIRC the response was quite positive.

Eddie



More information about the Python-list mailing list