how to remove n bytes in a file?

Tim Chase python.list at tim.thechases.com
Sat Sep 2 10:34:46 EDT 2006


> Suppose we have a very large file, and wanna remove 'n' bytes in the
> middle of the file. My thought is:
> 1, read() until we reach the bytes should be removed, and mark the
> position as 'pos'.
> 2, seek(tell() + n) bytes
> 3, read() until we reach the end of the file, into a variable, say 'a'
> 4, seek(pos) back to 'pos'
> 5, write(a)
> 6, truncate()
> 
> If the file is really large, the performance may be a problem.

The biggest problem I see would be trying to read some massive 
portion if step #3 involves a huge amount of data.  If you're 
dealing with a multi-gigabyte file, and you want to delete 5 
bytes beginning at 20 bytes into the file, step #3 involves 
reading in file_size-(20+5) bytes into memory, and then spewing 
them all back out.  A better way might involve reading a 
fixed-size chunk each time and then writing that back to its 
proper offset.

def shift(f, offset, size, buffer_size=1024*1024):
	"""deletes a portion of size "size" from file "f", starting at 
offset, and shifting the remainder of the file to fill.

The buffer_size can be tweaked for performance preferences,
defaulting to 1 megabyte.
"""
	f.seek(offset+size)
	while True:
		buffer = f.read(buffer_size)
		if not buffer: break
		f.seek(offset)
		f.write(buffer)
		f.seek(buffer_size,1)
		offset += buffer_size
	f.truncate()

if __name__ == '__main__':
	offset = ord('p')
	size = 5
	buffer_size = 30

	from StringIO import StringIO
	f = StringIO(''.join([chr(i) for i in xrange(256)]))
	print repr(f.read())
	print '=' * 50
	f.seek(0)
	shift(f, offset, size, buffer_size)
	f.seek(0)
	print repr(f.read())


> Is there a clever way to finish? Could mmap() help? Thx

No idea regarding mmap().

-tkc










More information about the Python-list mailing list