marshal.dumps quadratic growth and marshal.dump not allowing file-like objects

John Machin sjmachin at lexicon.net
Sun Jun 15 06:16:51 EDT 2008


On Jun 15, 7:47 pm, Peter Otten <__pete... at web.de> wrote:
> bkus... at gmail.com wrote:
> > I'm stuck on a problem where I want to use marshal for serialization
> > (yes, yes, I know (c)Pickle is normally recommended here). I favor
> > marshal for speed for the types of data I use.
>
> > However it seems that marshal.dumps() for large objects has a
> > quadratic performance issue which I'm assuming is that it grows its
> > memory buffer in constant increments. This causes a nasty slowdown for
> > marshaling large objects. I thought I would get around this by passing
> > a cStringIO.StringIO object to marshal.dump() instead but I quickly
> > learned this is not supported (only true file objects are supported).
>
> > Any ideas about how to get around the marshal quadratic issue? Any
> > hope for a fix for that on the horizon? Thanks for any information.
>
> Here's how marshal resizes the string:
>
>         newsize = size + size + 1024;
>         if (newsize > 32*1024*1024) {
>                 newsize = size + 1024*1024;
>         }
>
> Maybe you can split your large objects and marshal multiple objects to keep
> the size below the 32MB limit.
>

But that change went into the svn trunk on 11-May-2008; perhaps the OP
is using a production release which would have the previous version,
which is merely "newsize = size + 1024;".

Do people really generate 32MB pyc files, or is stopping doubling at
32MB just a safety valve in case someone/something runs amok?

Cheers,
John



More information about the Python-list mailing list