Playing with Dict - Can anyone see a way around this?

Alex Martelli aleaxit at yahoo.com
Sun May 20 03:18:13 EDT 2001


<s713221 at student.gu.edu.au> wrote in message
news:3B078E75.E5C25BAE at student.gu.edu.au...
> Thanks Alex. I had to play around with the MyDict class to force update
> of the keys, but this works as I wanted it to. I decided to leave out
> the sort - mainly because of the desync between keys,values. (Items
    ...
> def __update(self):
>     self.keys = self.__keys(self.dict)
>     self.items = self.__items(self.dict)
>     self.values = self.__values(self.dict)

This does allow dictionary alteration during a loop on (e.g.) keys
(it's gonna be pretty slow since __update will be called over and
over again, but that's another issue) because a shiny new list is
prepared for (e.g.) self.keys each time, rather than altering the
old one... and the loop will be looping on the old one.  So, fine
(apart from possible performance issues).

Sorting isn't hard if you do want it, e.g.:

    self.items = self.__items(self.dict)
    self.items.sort()
    self.keys, self.values = map(list,zip(*self.items))

> and hey, I discovered that python doesn't use __getattr__ if the
> attribute is in __dict__.

Excellent, because this is absolutely key.  In particular, it is what
allows a potentially important optimization regarding the above
mentioned performance issue.  Consider some client-code...:

    for k in plik.keys:
        if buup(k): del plik[k]
    for k in plik.keys:
        print 'Survived key:',k

In the current approach, plik.__update will be called at each and
every del -- that may be a lot of times and a lot of work... all for
nothing except the last time.

This is a reasonably frequent pattern of accesses to data: when a
change comes, many changes may come 'bunched up', then no
more changes for a while (just read-access) until next bunch.

Which is why motivates lazy, aka just-in-time, strategies of
updating... consider:

def __recompute(self, attname):
    self.items = self.__items(self.dict)
    self.items.sort()
    self.keys, self.values = map(list,zip(*self.items))
    return getattr(self, attname)

def __update(self):
    del self.items, self.keys, self.values

def __getattr__(self, attname):
    if attname in ('items','keys','values'):
        return self.__recompute(attname)
    # continue with the rest of __getattr__ as now

When the underlying dict changes, __update is called, just
like now, but what it does is "mark invalid" the attributes by
the simple and effective strategy of removing them altogether.

So NEXT time an attribute is accessed, it won't be in __dict__
and so __getattr__ will be called -- and it will delegate to
__recompute for just-in-time recomputation...


Alex






More information about the Python-list mailing list