speed of string chunks file parsing
bearophileHUGS at lycos.com
bearophileHUGS at lycos.com
Mon Apr 6 11:20:52 EDT 2009
Hyunchul Kim:
> Following script do exactly what I want but I want to improve the speed.
This may be a bit faster, especially if sequences are long (code
untested):
import re
from collections import deque
def scanner1(deque=deque):
result_seq = deque()
cp_regular_expression = re.compile("^a complex regular expression
here$")
for line in file(inputfile):
if cp_regular_expression.match(line) and result_seq:
yield result_list
result_seq = deque()
result_seq.append(line)
yield result_seq
If the sequences are processed on the fly then you don't need to
create new ones and you can clear old ones, this may be a bit faster:
def scanner2(deque=deque):
result_seq = deque()
cp_regular_expression = re.compile("^a complex regular expression
here$")
for line in file(inputfile):
if cp_regular_expression.match(line) and result_seq:
yield result_list
result_seq.clear()
result_seq.append(line)
yield result_seq
Note that most of the time may be used by the regular expression,
often there are ways to speed it up using string methods, even as a
first faster approximate match too.
Bye,
bearophile
More information about the Python-list
mailing list