Is there any way to minimize str()/unicode() objects memory usage [Python 2.6.4] ?

Carl Banks pavlovevidence at gmail.com
Sat Aug 7 00:55:26 EDT 2010


On Aug 6, 6:56 pm, dmtr <dchich... at gmail.com> wrote:
> > > Well...  63 bytes per item for very short unicode strings... Is there
> > > any way to do better than that? Perhaps some compact unicode objects?
>
> > There is a certain price you pay for having full-feature Python objects.
>
> Are there any *compact* Python objects? Optimized for compactness?

Yes, but probably not in the way that'd be useful to you.

Look at the array module, and also consider the third-party numpy
library.  They store compact arrays of numeric types (mostly) but they
have character type storage as well.  That probably won't help you,
though, since you have variable-length strings.

I don't know of any third-party types that can do what you want, but
there might be some.  Search PyPI.


> > What are you trying to accomplish anyway? Maybe the array module can be
> > of some help. Or numpy?
>
> Ultimately a dict that can store ~20,000,000 entries: (u'short
> string' : (int, int, int, int, int, int, int)).

My recommendation would be to use sqlite3.  Only if you know for sure
that it's too slow--meaning that you've actually tried it and it was
too slow, and nothing else--then should you bother with a

For that I'd probably go with a binary tree rather than a hash.  So
you have a huge numpy character array that stores all 20 million short
strings end-to-end (in lexical order, so that you can look up the
strings with a binary search), then you have an numpy integer array
that stores the indices into this string where the word boundaries
are, and then an Nx7 numpy integer array storing the int return
vslues.  That's three compact arrays.


Carl Banks



More information about the Python-list mailing list