[Python-Dev] Rethinking intern() and its data structure

"Martin v. Löwis" martin at v.loewis.de
Thu Apr 9 21:06:40 CEST 2009


> So I guess some of it comes down to whether "loweis" would also reject
> this change on the basis that mathematically a "set is not a dict".

I'd like to point out that this was not the reason to reject it.
Instead, this (or, the opposite of it) was given as a reason why this
patch should be accepted (in msg50482). I found that a weak rationale
for making that change, in particular because I think the rationale
is incorrect.

I like your rationale (save memory) much more, and was asking in the
tracker for specific numbers, which weren't forthcoming.

> Though given that his claim "nobody else is speaking in favor of the
> patch", while at least Colin Winter has expressed some interest at this
> point.

Again, at that point in the tracker, none of the other committers had
spoken in favor of the patch. Since I wasn't convinced of its
correctness, and nobody else (whom I trust) had reviewed it as correct,
I rejected it.

Now that you brought up a specific numbers, I tried to verify them,
and found them correct (although a bit unfortunate), please see my
test script below. Up to 21800 interned strings, the dict takes (only)
384kiB. It then grows, requiring 1536kiB. Whether or not having 22k
interned strings is "typical", I still don't know.

Wrt. your proposed change, I would be worried about maintainability,
in particular if it would copy parts of the set implementation.

Regards,
Martin

import gc, sys
def find_interned_dict():
    cand = None
    for o in gc.get_objects():
        if not isinstance(o, dict):
            continue
        if "find_interned_dict" not in o:
            continue
        for k,v in o.iteritems():
            if k is not v:
                break
        else:
            assert not cand
            cand = o
    return cand

d = find_interned_dict()
print len(d), sys.getsizeof(d)

l = []
for i in range(20000):
    if i%100==0:
        print len(d), sys.getsizeof(d)

    l.append(intern(repr(i)))


More information about the Python-Dev mailing list