An Odd Little Script
Michael Hoffman
cam.ac.uk at mh391.invalid
Wed Mar 9 17:44:28 EST 2005
Greg Lindstrom wrote:
> I have a file with varying length records. All
> but the first record, that is; it's always 107 bytes long. What I would
> like to do is strip out all linefeeds from the file, read the character
> in position 107 (the end of segment delimiter) and then replace all of
> the end of segment characters with linefeeds, making a file where each
> segment is on its own line.
Hmmmm... here's one way of doing it:
import mmap
import sys
DELIMITER_OFFSET = 107
data_file = file(sys.argv[1], "r+w")
data_file.seek(0, 2)
data_length = data_file.tell()
data = mmap.mmap(data_file.fileno(), data_length, access=mmap.ACCESS_WRITE)
delimiter = data[DELIMITER_OFFSET]
for index, char in enumerate(data):
if char == delimiter:
data[index] = "\n"
data.flush()
There are doubtless more efficient ways, like using mmap.mmap.find()
instead of iterating over every character but that's an exercise for
the reader. And personally I would make extra copies ANYWAY--not doing
so is asking for trouble.
--
Michael Hoffman
More information about the Python-list
mailing list