[pypy-dev] vmprof compression

Stuart Axon stuaxo2 at yahoo.com
Fri Mar 27 12:39:42 CET 2015


Hi,   It's worth adding lzop to the list, of compressors to test, as it's built specifically to have a low CPU overhead, at the cost of some compression ratio.

http://www.lzop.org/
 S++ 


     On Thursday, March 26, 2015 11:29 PM, John Camara <john.m.camara at gmail.com> wrote:
   
 

 Hi Fijal,
To recap and continue the discussion from irc.
We already discussed that the stack id are based on a counter which is good but I also want to confirm that the ids have locality associated with the code.  That is similar areas of the code will have similar ids.  Just to make sure are not random with respect to the code otherwise compression will not be helpful.  If the ids are random that would need to be corrected first.
Right now the stack traces are written to the file repeating the following sequence
MARKER_STACKTRACE
countdepthstackstack...stack
In order to get a high compression ratio it would be better to combine multiple stacktraces and rearrange the data as follows
MARKER_COMPRESSED_STACKTRACEScounts_compressed_lengthcounts_compresseddepths_compressed_lengthdepths_compressedstacks_compressed_lengthstacks_compressed
In order to build the compress data you will want to 3 pairs of 2 buffers.  A pair of buffers for counts, depths, and stacks.  Your profiller would be writing to one set of buffers and another thread would be responsible for compressing buffers that are full and writing them to the file.  Once a set of buffers are full the profiller would start filling up the other set of buffers.
For each set of buffers you need a variable to hold the previous count, depth, and stack id.  They will be initialized to 0 before any data is written to an empty buffer.  In stead of writing the actual count value into the counts buffer you will write the difference between the current count and the previous count.  The reason for doing this is that the delta values will mostly be around 0 which will significantly improve the compression ratio without adding much overhead.  Of course you would do the same for depths and stack ids.
When you compress the data you compress each buffer individually to make sure like data is being compressed.  Like data compresses better the unlike data and by saving deltas very few bits will be required to represent the data and you are likely to have long strings of 0s and 1s.
I'm sure now you can see why I don't want stack ids being random.  As if they are random then the deltas will be all over the place so you wont end up with long strings of 0s and 1s and random data itself does not compress.
To test this out I wouldn't bother modifying the c code but instead try it out in Python to first make sure the compression is providing huge gains and figure out how to tune the algorithm without having to mess with the signal handlers and writing the code for the separate thread and dealing issues such as making sure you don't start writing to a buffer before the thread finished writing the data to the file, etc.  I would just read an existing profile file and rewrite it to a different file by rearranging the data and compressing the delta as I described.  You can get away with one set of buffers as you wouldn't be profiling at the same time.
To tune this process you will need to determine the appropriate number of stack traces that is small enough to keep memory down but large enough so that the overhead associated with compression small.  Maybe start of with about 8000 stack traces.  I would try gzip, bz2, and lzma and look at their compression ratios and times.  Gzip is general faster than bz2 and lzma is the slowest.  On the other hand lzma provides the best compression and gzip the worse.  Since you will be compressing deltas you most likely can get away with using the fastest compression options under each compressor and not effect the compression ratio.  But I would test it to verify this as it does depend on the data being compressed whether or not this is true.  Also one option that is available in lzma is the ability to set the width of the data to look at when looking for patterns.  Since you are saving 32 or 64 bit ints depending on the platform you can set the option to either 4 or 8 bytes based on the platform.  I don't believe qzip or bz2 have this option.  By setting this option in lzma you will likely improve the compression ratio.
You may find that counts and depths give similar compression, between the 3 compression types in which case just use the fastest which will likely be gzip.  On the other hand maybe the stack ids will be better off using lzma.  This is also another reason to separate out, like data, as it gives you an option to use the fastest compressors for some data types while using others to provide for better compression.
I would not be surprised if this approach achieves a compression ratio better than 100x but that will be heavily dependent on how local the stack ids are.  Also don't forget about simple things like not using 64 bit ints when you can get away with smaller ones.
Also for a slight variation to the above.  If you find most of your deltas are < 127 you could write them out as 1 byte and when greater than 127 write them out as a 4 byte int with the high bit set.  If you do this then don't set the lzma option to 4 or 8 byte boundaries as now your data is a mixture of 1 and 4 byte values.  This sometimes can provide huge reductions in compression times without much effect on the overall compression ratio.
Hope you find this helpful.
John
_______________________________________________
pypy-dev mailing list
pypy-dev at python.org
https://mail.python.org/mailman/listinfo/pypy-dev


 
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20150327/e23d64a9/attachment.html>


More information about the pypy-dev mailing list