Regular expressions in Python

Andreas Jung andreas at andreas-jung.com
Sun Sep 3 09:54:11 EDT 2000


johnvert at my-deja.com wrote:
2qy
2qyHello,
2qy
2qyI have a few questions regarding the usage of regular expressions in
2qyPython.
2qy
2qy1)  In Perl, I can do something like
2qy
2qy    if (/START(.+?)END/) {
2qy      use $1 here (value caught in (.+?))
2qy    }
2qy
2qy    what is the equivalent of Perl's $1, $2, ... in Python.

In general take a look the documenation of the module "re".
In Python you could try something link:

import re
lst = re.findall('START(.*?)END',string)
for l in lst: print l



2qy
2qy2)  This question is not directly related to regular expressions,
2qybut        more to parsing text in Python in general:
2qy
2qy    I want to capture stuff between START and END, like in the
2qyabove         regular expression, but START, the stuff in the middle,
2qyand END are      not necessarily on the same line.  The only way I can
2qythink of is to     read the whole file into memory as a string, and
2qyoperate on that         string, or read it with readlines() and join()
2qythose to a string.        Both of these approaches would be slow because
2qythe file would be         read in one slurp.  Is there a way to handle
2qythis `multiple line'        parsing in a way that I can read the file
2qyline by line, as in:
2qy
2qy    while 1:
2qy      line = file.readline()
2qy      # parse

Why would it be to slow ? We read large files (up to 50MB) either
line by line with readline() or in just one read() call. This
is not neccessarily slower than a program in C.

Andreas



More information about the Python-list mailing list