Finding size of Variable

wxjmfauth at gmail.com wxjmfauth at gmail.com
Thu Feb 6 05:15:01 EST 2014


Le mercredi 5 février 2014 12:44:47 UTC+1, Chris Angelico a écrit :
> On Wed, Feb 5, 2014 at 10:00 PM, Steven D'Aprano
> 
> <steve+comp.lang.python at pearwood.info> wrote:
> 
> >> where stopWords.txt is a file of size 4KB
> 
> >
> 
> > My guess is that if you split a 4K file into words, then put the words
> 
> > into a list, you'll probably end up with 6-8K in memory.
> 
> 
> 
> I'd guess rather more; Python strings have a fair bit of fixed
> 
> overhead, so with a whole lot of small strings, it will get more
> 
> costly.
> 
> 
> 
> >>> sys.version
> 
> '3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan  5 2014, 16:23:43) [MSC v.1600 32
> 
> bit (Intel)]'
> 
> >>> sys.getsizeof("asdf")
> 
> 29
> 
> 
> 
> "Stop words" tend to be short, rather than long, words, so I'd look at
> 
> an average of 2-3 letters per word. Assuming they're separated by
> 
> spaces or newlines, that means there'll be roughly a thousand of them
> 
> in the file, for about 25K of overhead. A bit less if the words are
> 
> longer, but still quite a bit. (Byte strings have slightly less
> 
> overhead, 17 bytes apiece, but still quite a bit.)
> 
> 
> 
> ChrisA

>>> sum([sys.getsizeof(c) for c in ['a']])
26
>>> sum([sys.getsizeof(c) for c in ['a', 'a EURO']])
68
>>> sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']])
112
>>> sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO']])
158
>>> sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO', 'aaaaaaaaaaaaaaaaaaaa EURO']])
238
>>> 
>>> 
>>> sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a']])
21
>>> sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO']])
46
>>> sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa EURO']])
75
>>> sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO']])
108
>>> sum([sys.getsizeof(c.encode('utf-32-be')) for c in ['a', 'a EURO', 'aa EURO', 'aaa EURO', 'aaaaaaaaaaaaaaaaaaaa EURO']])
209
>>> 
>>> 
>>> sum([sys.getsizeof(c) for c in ['a', 'a EURO', 'aa EURO']*3])
336
>>> sum([sys.getsizeof(c) for c in ['aa EURO aa EURO']*3])
150
>>> sum([sys.getsizeof(c.encode('utf-32')) for c in ['a', 'a EURO', 'aa EURO']*3])
261
>>> sum([sys.getsizeof(c.encode('utf-32')) for c in ['aa EURO aa EURO']*3])
135
>>>

jmf



More information about the Python-list mailing list