An Odd Little Script

Michael Hoffman cam.ac.uk at mh391.invalid
Wed Mar 9 17:44:28 EST 2005


Greg Lindstrom wrote:

> I have a file with varying length records.  All 
> but the first record, that is; it's always 107 bytes long.  What I would 
> like to do is strip out all linefeeds from the file, read the character 
> in position 107 (the end of segment delimiter) and then replace all of 
> the end of segment characters with linefeeds, making a file where each 
> segment is on its own line.

Hmmmm... here's one way of doing it:

import mmap
import sys

DELIMITER_OFFSET = 107

data_file = file(sys.argv[1], "r+w")
data_file.seek(0, 2)
data_length = data_file.tell()
data = mmap.mmap(data_file.fileno(), data_length, access=mmap.ACCESS_WRITE)
delimiter = data[DELIMITER_OFFSET]

for index, char in enumerate(data):
     if char == delimiter:
         data[index] = "\n"

data.flush()

There are doubtless more efficient ways, like using mmap.mmap.find()
instead of iterating over every character but that's an exercise for
the reader. And personally I would make extra copies ANYWAY--not doing
so is asking for trouble.
-- 
Michael Hoffman



More information about the Python-list mailing list