Regular Expression - newbie question
Eddie Corns
eddie at holyrood.ed.ac.uk
Wed Aug 28 13:53:08 EDT 2002
skunix at hotmail.com (SK) writes:
>I want to search for a pattern across multiple lines:-
>
>Pattern: "python output" followed by anything without "python output"
>again and then "Hello World"
>Input File (c.txt)
>----------
>This is my file
>This is one python output
>This is my file
>This is two python output
>This is my file
>Hello World
>The following code snippet matches only
> This is one python output
> Hello World
>But I am interested only in the following:-
> This is two python output
> Hello World
>Code Snippet
>============
>import re
>data = open("c.txt","rb").read()
>regexp = "(^.*python output.*$)[\000-\377]*?(^.*Hello World.*$)"
>r = re.compile(regexp,re.M)
>match = re.findall(r, data)
>print match
>Desired Output Match
>====================
>This is two python output
>Hello World
>Any good pointers/books for regular expressions in Python appreciated.
I'm fairly sure you can't do this with a regular expression. If I understand
the theory correctly you can only apply NOT to a limited class of nodes in the
expression (and even those are ad-hoc extensions added for usefulness).
Of course the easiest way is just to iterate over all the lines like so:
import re
last_op = None
for line in open('c.txt').readlines():
if re.search (r'python output',line):
last_op = line
if re.search (r'Hello World', line) and last_op:
print last_op,
print line,
but you knew that :)
Anyway, there was a thread discussing the 2nd edition of 'Mastering Regular
Expresssions' on this group recently - IIRC the response was quite positive.
Eddie
More information about the Python-list
mailing list