Finding size of Variable

Chris Angelico rosuav at gmail.com
Wed Feb 5 06:44:47 EST 2014


On Wed, Feb 5, 2014 at 10:00 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
>> where stopWords.txt is a file of size 4KB
>
> My guess is that if you split a 4K file into words, then put the words
> into a list, you'll probably end up with 6-8K in memory.

I'd guess rather more; Python strings have a fair bit of fixed
overhead, so with a whole lot of small strings, it will get more
costly.

>>> sys.version
'3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan  5 2014, 16:23:43) [MSC v.1600 32
bit (Intel)]'
>>> sys.getsizeof("asdf")
29

"Stop words" tend to be short, rather than long, words, so I'd look at
an average of 2-3 letters per word. Assuming they're separated by
spaces or newlines, that means there'll be roughly a thousand of them
in the file, for about 25K of overhead. A bit less if the words are
longer, but still quite a bit. (Byte strings have slightly less
overhead, 17 bytes apiece, but still quite a bit.)

ChrisA



More information about the Python-list mailing list