Secure Pickle-like module

jiba at tuxfamily.org jiba at tuxfamily.org
Thu May 25 18:41:44 EDT 2006


> There are a couple factual inaccuracies on the site that I'd like to clear up first:
> Trivial benchmarks put cerealizer and banana/jelly on the same level as far as performance goes:
> $ python -m timeit -s 'from cereal import dumps; L = ["Hello", " ", ("w", "o", "r", "l", "d", ".")]' 'dumps(L)'
> 10000 loops, best of 3: 84.1 usec per loop
> $ python -m timeit -s 'from twisted.spread import banana, jelly; dumps = lambda o: banana.encode(jelly.jelly(o)); L = ["Hello", " ", ("w", "o", "r", "l", "d", ".")]' 'dumps(L)'
> 10000 loops, best of 3: 89.7 usec per loop
>
> This is with cBanana though, which has to be explicitly enabled and, of course, is written in C.  So Cerealizer looks like it has the potential to do pretty well, performance-wise.

My personal benchmark was different; it was using a list with 2000
objects defined as following:

class O(object):
  def __init__(self):
    self.x = 1
    self.s = "jiba"
    self.o = None

with self.o referring to another O object. I think my benchmark,
although still very limited, is more representative since it involves
object, string, number and list.

See it there:
http://svn.gna.org/viewcvs/*checkout*/soya/trunk/cerealizer/test/test1.py?content-type=text%2Fplain&rev=31

The results are (using Psyco):
With old-style classes:
	cerealizer
	dumps in 0.0619530677795 s, 114914 bytes length
	loads in 0.0313038825989 s

	cPickle
	dumps in 0.0301840305328 s, 116356 bytes length
	loads in 0.023097038269 s

	jelly + banana
	dumps in 0.168012142181 s 169729 bytes length
	loads in 1.82081913948 s

	jelly + cBanana
	dumps in 0.082946062088 s 169729 bytes length
	loads in 0.156159877777 s

With new-style classes:
	cerealizer
	dumps in 0.0575239658356 s, 114914 bytes length
	loads in 0.028165102005 s

	cPickle
	dumps in 0.07634806633 s, 116428 bytes length
	loads in 0.0278959274292 s

	jelly + banana
	dumps in 0.156242132187 s 169729 bytes length
	(TypeError; I didn't investigate this problem yet although it is
surely solvable)

	jelly + cBanana
	dumps in 0.10772895813 s 169729 bytes length
	(TypeError; I didn't investigate this problem yet although it is
surely solvable)

As you see, cPickle is about 2 times faster than cerealizer for
old-style classes, but cerealizer beats cPickle for new-style classes
(which makes sense since I have optimized it for new-style classes).
However, Jelly is far behind, even using cBanana, especially for
loading.


> You talked about _Tuple and _Dereference on the website as well.  These are internal implementation details. jelly also supports extension types, by way of setUnjellyableForClass and similar functions.

The problem arises only when the extension type expects an attribute of
a specific class, e.g. (in Pyrex):

cdef class MyClass:
  cdef MyClass other

The other attribute of MyClass can only contains a reference to an
instance of MyClass (or None). Thus it cannot be set to an instance of
_Dereference or _Tuple, even temporarily; doing other =
_Dereference(...) raises an exception.

I solve this problem in Cerealizer by doing a 2-pass object creation:
step 1, create all the objects; step 2, set all objects' states.

> As far as security goes, no obvious problems jump out at me, either
> from the API for from skimming the code.  I think early-binding
> __new__, __getstate__, and __setstate__ may be going further than
> is necessary.  If someone can find code to set attributes on classes
> in your process space, they can probably already do anything they
> want to your program and don't need to exploit security problems in
> your serializer.

I agree on that; however I prefer to be "over-secure" than "just as
secure as necessary" :-)

Thank you for your opinion!
I'm going to update my website.
Jiba




More information about the Python-list mailing list