What is heating the memory here? hashlib?

Paulo da Silva p_s_d_a_s_i_l_v_a_ns at netcabo.pt
Sun Feb 14 21:21:23 EST 2016


Às 07:04 de 14-02-2016, Paulo da Silva escreveu:
> I was unable to reproduce the situation using a simple program just
> walking through all files>4K, with or without the seek, and computing
> their shasums.
> Only some fluctuations of about 500MB in memory consumption.

Today I gave another try to the program using 40MB bfsz on the same
circumstances except for a previous reboot and, surprisingly, it worked
pretty fine. The fluctuations in memory were of the same magnitude of
those of the simple program. No swaps at all!

Some history ...

The 1st. time the problem occurred, I found an issue that I thought
could cause that behavior. An equivalent statement for
h=hashlib.sha256() was out of the files loop.
I put it in the arguments parser because the user could choose the
algorithm to use. And instead of testing the option for each file I put
it there. Apart from the memory leakage hashlib seemed to work fine.
After the "digest" I started feeding it with the contents of another file.

1. Is it possible that the memory exhaustion caused some sort of problem
that left the system in a way to cause gc malfunction on the next runs?

2. The filesystem is btrfs.
So, is it possible some "fight" among btrfs, gc and my program cause
inability to gc free memory in time?
This seems unlikely because I was only reading and the filesystem is
mounted with noatime. However I don't know if btrfs takes some
organization work during the readings.
Anyway, I tried at least 3 times the failed tests one of which updating
hashlib with 8KB chunks and another with 1MB bfsz. This last one ran
until the end but used ~5GB swap.

3. There is another small change I made since then. Some (few) times
hashlib was fed with empty data (zero length). That was fixed.

So far I tried the program twice and it ran perfectly.

When I need to run it in future, out of this confusion, and if the same
problem occurs, I'll try to see things more carefully.

Once more thank you all.
Paulo



More information about the Python-list mailing list