Marshal Obj is String or Binary?

Steven D'Aprano steve at REMOVETHIScyber.com.au
Sat Jan 14 18:19:29 EST 2006


On Sat, 14 Jan 2006 13:50:24 -0800, Mike wrote:

> Thanks everyone.
> 
> Why Marshal & not Pickle: Well, Marshal is supposed to be faster.

Faster than cPickle? 

Even faster would be to write your code in assembly, and dump that
ridiculously bloated database and just write everything to raw bytes on
an unformatted disk. Of course, it might take the programmer a thousand
times longer to actually write the program, and there will probably be
hundreds of bugs in it, but the important thing is that you'll save three
or four milliseconds at runtime.

Right?

Unless you've actually done proper measurements of the time taken, with
realistic sample data, worrying about saving a byte here and a
millisecond there is just wasting your time, and is often
counter-productive. Optimization without measurement is as likely to
result in slower, fatter performance as it is faster and leaner. 

marshal is not designed to be portable across versions. Do you *really*
think it is a good idea to tie the data in your database to one specific
version of Python?


> But
> then, if I wanted to do the whole repr()-eval() hack, I am already
> defeating the purpose by refusing to save bytes as bytes in terms of
> both size and speed.
> 
> At this point, I am considering one of the following:
> - Save my structure as binary data, and reference the file from my db
> - Find a clean method of saving bytes into my db

Your database either can handle binary data, or it can't.

If it can, then just use pickle with a binary protocol and be done with it.

If it can't, then just use pickle with a plain text protocol and be done
with it.

Either way, you have to find a way to translate your Python data
structures into something that you can feed to the database. Your database
can't automatically suck data structures out of Python's working memory!
So why re-invent the wheel? marshal is not recommended, but if you can
live with the limitations of marshal then it might do the job. But trying
to optimise code that hasn't even been written yet is a sure way to
trouble.


-- 
Steven.




More information about the Python-list mailing list