What is heating the memory here? hashlib?

Paulo da Silva p_s_d_a_s_i_l_v_a_ns at netcabo.pt
Sat Feb 13 14:29:35 EST 2016


Hello all.

I'm running in a very strange (for me at least) problem.

	def getHash(self):
		bfsz=File.blksz
		h=hashlib.sha256()
		hu=h.update
		with open(self.getPath(),'rb') as f:
			f.seek(File.hdrsz)	# Skip header
			b=f.read(bfsz)
			while len(b)>0:
				hu(b)
				b=f.read(bfsz)
		fhash=h.digest()
		return fhash

hdrsz is always 4K here. All files are greater than 4K.

If I use a 40MB bfsz this tooks all my memory very quickly. After few
hundreds of files it begins to swap ending up with the program being
killed (BTW, I'm using linux kubuntu 14.04).

If I reduce bfsz to 1MB it successfully completes my full test (~100000
files) reaching about 6GB of memory.

If I reduce further bfsz to 16KB there is no noticeable memory taken!!

I have tried the following code, but it didn't fix the problem:

	def getHash(self):
		bfsz=File.blksz
		h=hashlib.sha256()
		hu=h.update
		with open(self.getPath(),'rb') as f:
			husz=8192
			f.seek(File.hdrsz)	# Skip header
			b=f.read(bfsz)
			while len(b)>0:
				for i in range(0,len(b),husz):
					hu(b[i:i+husz])
				b=f.read(bfsz)
		fhash=h.digest()
		return fhash

What is wrong here?!

Thanks for any help/comments.
Paulo



More information about the Python-list mailing list