Using regular expressions to extract substrings from files

Brian Szmyd szmyd at colostate.edu
Thu Sep 9 23:08:49 EDT 2004


Timothy Hume wrote:

> Hi,
> 
> I am new to Python, and was wondering if it is possible to operate on
> files using regular expressions.
> 
> What I mean is this:
> - It is easy to search for a substring of a string using regular
>   expressions
> - Can I also search for a substring inside a file using regular
>   expressions? The substrin g may span several lines (ie there may be
>   embedded new line and carriage return characters).
> 
> So far, the only way I know how to do this is to read the entire file into
> a string, and then parse the resulting string with regular expressions.
> This is OK for small files (in fact it is probably quite efficient,
> because the disc I/O is done all at once). However, once the files get
> large, there is the risk I will run out of memory. The closest UNIX tool I
> can think of to do this sort of job is grep, but that doesn't have the
> power and flexibility of Python.
> 
> Any ideas would be appreciated.
> 
> Tim Hume
> Bureau of Meteorology Research Centre
> Melbourne
> Australia

You could always call grep from python if that will work for you, otherwise
you'll probably have to read in the file using some buffer and check the
buffer each time, problem is, what if it spans two buffers right?

As for spanning lines, they fall under the category of "whitespace", so
allowing them in your regular expression would be appropriate.

-regards
brian szmyd



More information about the Python-list mailing list