Memory Usage of Strings

Amit Dev amitdev at gmail.com
Wed Mar 16 14:20:34 EDT 2011


sum(map(len, l)) =>  99998200 for 1st case and 99999100 for 2nd case.
Roughly 100MB as I mentioned.

On Wed, Mar 16, 2011 at 11:21 PM, John Gordon <gordon at panix.com> wrote:
> In <mailman.988.1300289897.1189.python-list at python.org> Amit Dev <amitdev at gmail.com> writes:
>
>> I'm observing a strange memory usage pattern with strings. Consider
>> the following session. Idea is to create a list which holds some
>> strings so that cumulative characters in the list is 100MB.
>
>> >>> l = []
>> >>> for i in xrange(100000):
>> ...  l.append(str(i) * (1000/len(str(i))))
>
>> This uses around 100MB of memory as expected and 'del l' will clear that.
>
>> >>> for i in xrange(20000):
>> ...  l.append(str(i) * (5000/len(str(i))))
>
>> This is using 165MB of memory. I really don't understand where the
>> additional memory usage is coming from.
>
>> If I reduce the string size, it remains high till it reaches around
>> 1000. In that case it is back to 100MB usage.
>
> I don't know anything about the internals of python storage -- overhead,
> possible merging of like strings, etc.  but some simple character counting
> shows that these two loops do not produce the same number of characters.
>
> The first loop produces:
>
> Ten single-digit values of i which are repeated 1000 times for a total of
> 10000 characters;
>
> Ninety two-digit values of i which are repeated 500 times for a total of
> 45000 characters;
>
> Nine hundred three-digit values of i which are repeated 333 times for a
> total of 299700 characters;
>
> Nine thousand four-digit values of i which are repeated 250 times for a
> total of 2250000 characters;
>
> Ninety thousand five-digit values of i which are repeated 200 times for
> a total of 18000000 characters.
>
> All that adds up to a grand total of 20604700 characters.
>
> Or, to condense the above long-winded text in table form:
>
> range         num digits 1000/len(str(i))  total chars
> 0-9            10 1      1000                    10000
> 10-99          90 2       500                    45000
> 100-999       900 3       333                   299700
> 1000-9999    9000 4       250                  2250000
> 10000-99999 90000 5       200                 18000000
>                                              ========
>                          grand total chars   20604700
>
> The second loop yields this table:
>
> range         num digits 5000/len(str(i))  total bytes
> 0-9            10 1      5000                    50000
> 10-99          90 2      2500                   225000
> 100-999       900 3      1666                  1499400
> 1000-9999    9000 4      1250                 11250000
> 10000-19999 10000 5      1000                 10000000
>                                              ========
>                          grand total chars   23024400
>
> The two loops do not produce the same numbers of characters, so I'm not
> surprised they do not consume the same amount of storage.
>
> P.S.: Please forgive me if I've made some basic math error somewhere.
>
> --
> John Gordon                   A is for Amy, who fell down the stairs
> gordon at panix.com              B is for Basil, assaulted by bears
>                                -- Edward Gorey, "The Gashlycrumb Tinies"
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list