marshal.dumps quadratic growth and marshal.dump not allowing file-like objects

Raymond Hettinger python at rcn.com
Sun Jun 15 06:16:10 EDT 2008


On Jun 15, 1:04 am, bkus... at gmail.com wrote:
> However it seems that marshal.dumps() for large objects has a
> quadratic performance issue which I'm assuming is that it grows its
> memory buffer in constant increments.

Looking at the source in http://svn.python.org/projects/python/trunk/Python/marshal.c
, it looks like the relevant fragment is in w_more():

      . . .
	size = PyString_Size(p->str);
	newsize = size + size + 1024;
	if (newsize > 32*1024*1024) {
		newsize = size + 1024*1024;
	}
	if (_PyString_Resize(&p->str, newsize) != 0) {
      . . .

When more space is needed, the resize operation over-allocates by
double the previous need plus 1K.  This should give amortized O(1)
performance just like list.append().

However, when that strategy requests more than 32Mb, the resizing
becomes less aggressive and grows only in 1MB blocks and giving your
observed nasty quadratic behavior.

Raymond



More information about the Python-list mailing list