[pypy-dev] vmprof compression

Thu Mar 26 17:29:18 CET 2015

Hi Fijal,

To recap and continue the discussion from irc.

We already discussed that the stack id are based on a counter which is good
but I also want to confirm that the ids have locality associated with the
code.  That is similar areas of the code will have similar ids.  Just to
make sure are not random with respect to the code otherwise compression
will not be helpful.  If the ids are random that would need to be corrected
first.

Right now the stack traces are written to the file repeating the following
sequence

MARKER_STACKTRACE
count
depth
stack
stack
...
stack

In order to get a high compression ratio it would be better to combine
multiple stacktraces and rearrange the data as follows

MARKER_COMPRESSED_STACKTRACES
counts_compressed_length
counts_compressed
depths_compressed_length
depths_compressed
stacks_compressed_length
stacks_compressed

In order to build the compress data you will want to 3 pairs of 2 buffers.
A pair of buffers for counts, depths, and stacks.  Your profiller would be
writing to one set of buffers and another thread would be responsible for
compressing buffers that are full and writing them to the file.  Once a set
of buffers are full the profiller would start filling up the other set of
buffers.

For each set of buffers you need a variable to hold the previous count,
depth, and stack id.  They will be initialized to 0 before any data is
written to an empty buffer.  In stead of writing the actual count value
into the counts buffer you will write the difference between the current
count and the previous count.  The reason for doing this is that the delta
values will mostly be around 0 which will significantly improve the
compression ratio without adding much overhead.  Of course you would do the
same for depths and stack ids.

When you compress the data you compress each buffer individually to make
sure like data is being compressed.  Like data compresses better the unlike
data and by saving deltas very few bits will be required to represent the
data and you are likely to have long strings of 0s and 1s.

I'm sure now you can see why I don't want stack ids being random.  As if
they are random then the deltas will be all over the place so you wont end
up with long strings of 0s and 1s and random data itself does not compress.

To test this out I wouldn't bother modifying the c code but instead try it
out in Python to first make sure the compression is providing huge gains
and figure out how to tune the algorithm without having to mess with the
signal handlers and writing the code for the separate thread and dealing
issues such as making sure you don't start writing to a buffer before the
thread finished writing the data to the file, etc.  I would just read an
existing profile file and rewrite it to a different file by rearranging the
data and compressing the delta as I described.  You can get away with one
set of buffers as you wouldn't be profiling at the same time.

To tune this process you will need to determine the appropriate number of
stack traces that is small enough to keep memory down but large enough so
that the overhead associated with compression small.  Maybe start of with
about 8000 stack traces.  I would try gzip, bz2, and lzma and look at their
compression ratios and times.  Gzip is general faster than bz2 and lzma is
the slowest.  On the other hand lzma provides the best compression and gzip
the worse.  Since you will be compressing deltas you most likely can get
away with using the fastest compression options under each compressor and
not effect the compression ratio.  But I would test it to verify this as it
does depend on the data being compressed whether or not this is true.  Also
one option that is available in lzma is the ability to set the width of the
data to look at when looking for patterns.  Since you are saving 32 or 64
bit ints depending on the platform you can set the option to either 4 or 8
bytes based on the platform.  I don't believe qzip or bz2 have this
option.  By setting this option in lzma you will likely improve the
compression ratio.

You may find that counts and depths give similar compression, between the 3
compression types in which case just use the fastest which will likely be
gzip.  On the other hand maybe the stack ids will be better off using
lzma.  This is also another reason to separate out, like data, as it gives
you an option to use the fastest compressors for some data types while
using others to provide for better compression.

I would not be surprised if this approach achieves a compression ratio
better than 100x but that will be heavily dependent on how local the stack
ids are.  Also don't forget about simple things like not using 64 bit ints
when you can get away with smaller ones.

Also for a slight variation to the above.  If you find most of your deltas
are < 127 you could write them out as 1 byte and when greater than 127
write them out as a 4 byte int with the high bit set.  If you do this then
don't set the lzma option to 4 or 8 byte boundaries as now your data is a
mixture of 1 and 4 byte values.  This sometimes can provide huge reductions
in compression times without much effect on the overall compression ratio.

Hope you find this helpful.

John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20150326/f84dec86/attachment.html>