[pypy-dev] Speeding up zlib in standard library

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 15 19:30:10 CET 2012


On Thursday, March 15, 2012, Armin Rigo <arigo at tunes.org> wrote:
> Hi,
>
> On Wed, Mar 14, 2012 at 03:19, Peter Cock <p.j.a.cock at googlemail.com>
wrote:
>> I don't know - I was assuming any buffering would be the same
>> comparing PyPy 1.8 against Python 2.6 (and 3.2). That was one
>> reason for my email - is binding to C relatively slow (compared to
>> the rest of PyPy running pure Python)?
>
> Not necessarily.  You get direct C calls, both from the translated
> pypy and from JITted assembler code.  There are performance hits when
> e.g. the C library relies on macros, but I don't think that's the case
> of zlib.
>
> Passing big strings around, on the other hand, is typically slower on
> PyPy because they need to be copied between GC-managed areas and
> non-GC-managed areas.  There are vague ideas on how to improve but
> nothing I can summarize in two words.
>
> At this level, for profiling, you can use valgrind.  You'll see the
> time spent in zlib itself, the time spent copying big strings around,
> and the time spent actually executing the JIT-generated assembler
> (this ends up in "functions" with no name, just an address).

I think in my case it could be this "big string" issue then, rather
than the interface with zlib  itself. I'm dealing with 64kb chunks
of data which are zlib compressed, and I'm using (bytes) strings
to hold these in Python.

I've used valgrind before, but never with PyPy - hopefully I can
find some time to dig into this a bit further.

Thanks,

Peter

P.S. This is for blocked gzip format (BGZF) if you're curious,
http://blastedbio.blogspot.com/2011/11/bgzf-blocked-bigger-better-gzip.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20120315/c3b75dce/attachment-0001.html>


More information about the pypy-dev mailing list