Using regular expressions to extract substrings from files

Timothy Hume t.hume at bom.gov.au
Thu Sep 9 22:38:10 EDT 2004


Hi,

I am new to Python, and was wondering if it is possible to operate on 
files using regular expressions.

What I mean is this:
- It is easy to search for a substring of a string using regular 
  expressions
- Can I also search for a substring inside a file using regular 
  expressions? The substring may span several lines (ie there may be 
  embedded new line and carriage return characters).

So far, the only way I know how to do this is to read the entire file into 
a string, and then parse the resulting string with regular expressions. 
This is OK for small files (in fact it is probably quite efficient, 
because the disc I/O is done all at once). However, once the files get 
large, there is the risk I will run out of memory. The closest UNIX tool I 
can think of to do this sort of job is grep, but that doesn't have the 
power and flexibility of Python.

Any ideas would be appreciated.

Tim Hume
Bureau of Meteorology Research Centre
Melbourne
Australia




More information about the Python-list mailing list