[Python-Dev] dict.setdefault(object, object) instead of "sys.intern()" (was Re: sys.intern should work on bytes)

Jesus Cea jcea at jcea.es
Fri Sep 20 15:46:41 CEST 2013


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 20/09/13 15:33, Benjamin Peterson wrote:
> Well, the pickler should memoize bytes objects if you have lots of
> the same one in a pickle...

Only if they are the very same object. Not diferent bytes objects with
the same value. Pickle doesn't do "a==b" but "id(a)==id(b)".

Yes, I know that "a==b" would break mutable objects. It is just an
example.

I don't want to pursue that path. Performance of pickle is already
appallingly slow.

In my project, I will do the redundancy removal on my own way, as
explained in ither message on this thread.

Example:

* Original pickle: 14416284 bytes

* Pickle with "interned" strings: 3004880 bytes
(quite an improvement, but this is particular to my case, I have a lot
of string duplications here. The pickle also loads a bit faster)

* Pickle including an extra dictionary of "interned" strings, created
using the "interned.setdefault(object,object)" pattern: 5126587 bytes.
Sniff.

Could I do this more compactly?.


- -- 
Jesús Cea Avión                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
Twitter: @jcea                        _/_/    _/_/          _/_/_/_/_/
jabber / xmpp:jcea at jabber.org  _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAwUBUjxRwZlgi5GaxT1NAQKW8wP/dhVa/v3RZbOKvOtogpHGs5nZyjhtChwn
lFK1Lr1wl/+6IgCjgu9axkrRM0LLRaBN91HW+e9AkAM9XSFBQp6qAAqjJpI/jLDp
xRLW9fMRHpD21m1tG9zxziz4ACCLNNDnlsyY9l7oHHbMzaAX6Gbigyml3hEbj0uK
G5hk4VhyKEY=
=m/3T
-----END PGP SIGNATURE-----


More information about the Python-Dev mailing list