streaming a file object through re.finditer
Christos TZOTZIOY Georgiou
tzot at sil-tec.gr
Thu Feb 3 08:43:26 EST 2005
On Wed, 2 Feb 2005 22:22:27 -0500, rumours say that Daniel Bickett
<dbickett at gmail.com> might have written:
>Erick wrote:
>> True, but it doesn't work with multiline regular expressions :(
>If your intent is for the expression to traverse multiple lines (and
>possibly match *across* multiple lines,) then, as far as I know, you
>have no choice but to load the whole file into memory.
*If* the OP knows that their multiline re won't match more than, say, 4 lines at
a time, the code attached at the end of this post could be useful. Usage:
for group_of_lines in line_groups(<file>, line_count=4):
# bla bla
The OP should take care to ignore multiple matches as the n-line window scans
through the input file; eg. if your re searches for '3\n4', it will match 3
times in the first example of my code.
|import collections
|
|def line_groups(fileobj, line_count=2):
| iterator = iter(fileobj)
| group = collections.deque()
| joiner = ''.join
|
| try:
| while len(group) < line_count:
| group.append(iterator.next())
| except StopIteration:
| yield joiner(group)
| return
|
| for line in iterator:
| group.append(line)
| del group[0]
| yield joiner(group)
|
|if __name__=="__main__":
| import os, tempfile
|
| # create two temp file for 4-line groups
|
| # write n+3 lines in first file
| testname1= tempfile.mktemp() # depracated & insecure but ok for this test
| testfile= open(testname1, "w")
| testfile.write('\n'.join(map(str, range(7))))
| testfile.close()
|
| # write n-2 lines in second file
| testname2= tempfile.mktemp()
| testfile= open(testname2, "w")
| testfile.write('\n'.join(map(str, range(2))))
| testfile.close()
|
| # now iterate over four line groups
|
| for bunch_o_lines in line_groups( open(testname1), line_count=4):
| print repr(bunch_o_lines),
| print
|
| for bunch_o_lines in line_groups( open(testname2), line_count=4):
| print repr(bunch_o_lines),
| print
|
| os.remove(testname1); os.remove(testname2)
--
TZOTZIOY, I speak England very best.
"Be strict when sending and tolerant when receiving." (from RFC1958)
I really should keep that in mind when talking with people, actually...
More information about the Python-list
mailing list