Regular expressions in Python
Andreas Jung
andreas at andreas-jung.com
Sun Sep 3 09:54:11 EDT 2000
johnvert at my-deja.com wrote:
2qy
2qyHello,
2qy
2qyI have a few questions regarding the usage of regular expressions in
2qyPython.
2qy
2qy1) In Perl, I can do something like
2qy
2qy if (/START(.+?)END/) {
2qy use $1 here (value caught in (.+?))
2qy }
2qy
2qy what is the equivalent of Perl's $1, $2, ... in Python.
In general take a look the documenation of the module "re".
In Python you could try something link:
import re
lst = re.findall('START(.*?)END',string)
for l in lst: print l
2qy
2qy2) This question is not directly related to regular expressions,
2qybut more to parsing text in Python in general:
2qy
2qy I want to capture stuff between START and END, like in the
2qyabove regular expression, but START, the stuff in the middle,
2qyand END are not necessarily on the same line. The only way I can
2qythink of is to read the whole file into memory as a string, and
2qyoperate on that string, or read it with readlines() and join()
2qythose to a string. Both of these approaches would be slow because
2qythe file would be read in one slurp. Is there a way to handle
2qythis `multiple line' parsing in a way that I can read the file
2qyline by line, as in:
2qy
2qy while 1:
2qy line = file.readline()
2qy # parse
Why would it be to slow ? We read large files (up to 50MB) either
line by line with readline() or in just one read() call. This
is not neccessarily slower than a program in C.
Andreas
More information about the Python-list
mailing list