Regular Expression - newbie question

Eddie Corns eddie at holyrood.ed.ac.uk
Wed Aug 21 14:03:01 EDT 2002


skunix at hotmail.com (SK) writes:

>I want to parse a file for the following:-

>Line containing "python output" followed by anything and then "Hello
>World"
>The following code snippet works fine but How do I know that two such
>patterns exist in the file.

>Is it possible to mask the *anything* to be output on match.group()
>i.e. The output should NOT have "This is my file"

>I want something like this:- 

>Desired Output:-
>==============

>Pattern 1

>This is one python output
>Hello World

>Pattern 2

>This is two python output
>Hello World

>Total matches found is 2.

>PS: Also, my regular expression is determined dynamically by the user.


>Code Snippet
>============

>import re
>data = open("c.txt","rb").read()
>regexp = "^.*python output[\000-\377]*^.*Hello World[\000-\377].*$"
>r = re.compile(regexp,re.M)
>match = re.search(r, data)
>print match.group()

>c.txt
>=====

>This is my file
>This is one python output
>This is my file
>This is my file
>Hello World
>This is my file
>This is two python output
>This is my file
>Hello World


>Output
>======
>This is one python output
>This is my file
>This is my file
>Hello World
>This is my file
>This is two python output
>This is my file
>Hello World

>Thanks in Advance

import re
data = open("c.txt","rb").read()
regexp = "(^.*python output.*$)[\000-\377]*?(^.*Hello World.*$)"
r = re.compile(regexp,re.M)
match = re.findall(r, data)
print match

Should be most of it I think, I tried to get rid of that awful [\000-\377]
construction but the resulting expression was even uglier.

Eddie



More information about the Python-list mailing list