Memory Usage of Strings

Santoso Wijaya santoso.wijaya at gmail.com
Wed Mar 16 15:51:17 EDT 2011


??

Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit (AMD64)]
on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> L = []
>>> for i in xrange(100000):
... L.append(str(i) * (1000 / len(str(i))))
...
>>> sys.getsizeof(L)
824464
>>> L = []
>>> for i in xrange(20000):
... L.append(str(i) * (5000 / len(str(i))))
...
>>> sys.getsizeof(L)
178024
>>>

~/santa


On Wed, Mar 16, 2011 at 11:20 AM, Amit Dev <amitdev at gmail.com> wrote:

> sum(map(len, l)) =>  99998200 for 1st case and 99999100 for 2nd case.
> Roughly 100MB as I mentioned.
>
> On Wed, Mar 16, 2011 at 11:21 PM, John Gordon <gordon at panix.com> wrote:
> > In <mailman.988.1300289897.1189.python-list at python.org> Amit Dev <
> amitdev at gmail.com> writes:
> >
> >> I'm observing a strange memory usage pattern with strings. Consider
> >> the following session. Idea is to create a list which holds some
> >> strings so that cumulative characters in the list is 100MB.
> >
> >> >>> l = []
> >> >>> for i in xrange(100000):
> >> ...  l.append(str(i) * (1000/len(str(i))))
> >
> >> This uses around 100MB of memory as expected and 'del l' will clear
> that.
> >
> >> >>> for i in xrange(20000):
> >> ...  l.append(str(i) * (5000/len(str(i))))
> >
> >> This is using 165MB of memory. I really don't understand where the
> >> additional memory usage is coming from.
> >
> >> If I reduce the string size, it remains high till it reaches around
> >> 1000. In that case it is back to 100MB usage.
> >
> > I don't know anything about the internals of python storage -- overhead,
> > possible merging of like strings, etc.  but some simple character
> counting
> > shows that these two loops do not produce the same number of characters.
> >
> > The first loop produces:
> >
> > Ten single-digit values of i which are repeated 1000 times for a total of
> > 10000 characters;
> >
> > Ninety two-digit values of i which are repeated 500 times for a total of
> > 45000 characters;
> >
> > Nine hundred three-digit values of i which are repeated 333 times for a
> > total of 299700 characters;
> >
> > Nine thousand four-digit values of i which are repeated 250 times for a
> > total of 2250000 characters;
> >
> > Ninety thousand five-digit values of i which are repeated 200 times for
> > a total of 18000000 characters.
> >
> > All that adds up to a grand total of 20604700 characters.
> >
> > Or, to condense the above long-winded text in table form:
> >
> > range         num digits 1000/len(str(i))  total chars
> > 0-9            10 1      1000                    10000
> > 10-99          90 2       500                    45000
> > 100-999       900 3       333                   299700
> > 1000-9999    9000 4       250                  2250000
> > 10000-99999 90000 5       200                 18000000
> >                                              ========
> >                          grand total chars   20604700
> >
> > The second loop yields this table:
> >
> > range         num digits 5000/len(str(i))  total bytes
> > 0-9            10 1      5000                    50000
> > 10-99          90 2      2500                   225000
> > 100-999       900 3      1666                  1499400
> > 1000-9999    9000 4      1250                 11250000
> > 10000-19999 10000 5      1000                 10000000
> >                                              ========
> >                          grand total chars   23024400
> >
> > The two loops do not produce the same numbers of characters, so I'm not
> > surprised they do not consume the same amount of storage.
> >
> > P.S.: Please forgive me if I've made some basic math error somewhere.
> >
> > --
> > John Gordon                   A is for Amy, who fell down the stairs
> > gordon at panix.com              B is for Basil, assaulted by bears
> >                                -- Edward Gorey, "The Gashlycrumb Tinies"
> >
> > --
> > http://mail.python.org/mailman/listinfo/python-list
> >
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110316/db9aea9f/attachment-0001.html>


More information about the Python-list mailing list