Regular Expression - newbie question
Eddie Corns
eddie at holyrood.ed.ac.uk
Wed Aug 21 14:03:01 EDT 2002
skunix at hotmail.com (SK) writes:
>I want to parse a file for the following:-
>Line containing "python output" followed by anything and then "Hello
>World"
>The following code snippet works fine but How do I know that two such
>patterns exist in the file.
>Is it possible to mask the *anything* to be output on match.group()
>i.e. The output should NOT have "This is my file"
>I want something like this:-
>Desired Output:-
>==============
>Pattern 1
>This is one python output
>Hello World
>Pattern 2
>This is two python output
>Hello World
>Total matches found is 2.
>PS: Also, my regular expression is determined dynamically by the user.
>Code Snippet
>============
>import re
>data = open("c.txt","rb").read()
>regexp = "^.*python output[\000-\377]*^.*Hello World[\000-\377].*$"
>r = re.compile(regexp,re.M)
>match = re.search(r, data)
>print match.group()
>c.txt
>=====
>This is my file
>This is one python output
>This is my file
>This is my file
>Hello World
>This is my file
>This is two python output
>This is my file
>Hello World
>Output
>======
>This is one python output
>This is my file
>This is my file
>Hello World
>This is my file
>This is two python output
>This is my file
>Hello World
>Thanks in Advance
import re
data = open("c.txt","rb").read()
regexp = "(^.*python output.*$)[\000-\377]*?(^.*Hello World.*$)"
r = re.compile(regexp,re.M)
match = re.findall(r, data)
print match
Should be most of it I think, I tried to get rid of that awful [\000-\377]
construction but the resulting expression was even uglier.
Eddie
More information about the Python-list
mailing list