[Guppy-pe-list] An iteration idiom (Was: Re: loading files containing multiple dumps)

Fri Sep 4 09:25:43 EDT 2009

On Thu, 2009-09-03 at 10:05 +0100, Chris Withers wrote:
> Raymond Hettinger wrote:
> > In the first case, you would write:
> >    sets.extend(h.load(f))
> 
> yes, what I had was:
> 
> for s in iter(h.load(f)): sets.append(s)
> 
> ...which I mistakenly thought was working, but in in fact boils down to 
> Raymond's code.
> 
> The problem is that each item that h.load(f) returns *is* actually an 
> iterable, so either of the above just ends up the contents of each set 
> being extended onto `sets` rather than the sets themselved.

Yes that is what makes it confusing, otherwise you would get an
exception.

I hope the new loadall method as I wrote about before will resolve this.

def loadall(self,f):
    ''' Generates all objects from an open file f or a file named f'''
    if isinstance(f,basestring):
        f=open(f)
    while True:
        yield self.load(f)

Should we call it loadall? It is a generator so it doesn't really load
all immedietally, just lazily. Maybe call it iload? Or redefine load,
but that might break existing code so would not be good.

> It's all really rather confusing, apologies if there's interspersed rant 
> in here:
> 
>  >>> from guppy import hpy
>  >>> h = hpy()
> 
> Minor rant, why do I have to instantiate a
> <class 'guppy.heapy.Use._GLUECLAMP_'>
> to do anything with heapy?
> Why doesn't heapy just expose load, dump, etc?

Basically, the need for the h=hpy() idiom is to avoid any global
variables. Heapy uses some rather big internal data structures, to cache
such things as dict ownership. I didn't want to have all those things in
global variables. Now they are all contained in the hpy() session
context. So you can get rid of them by just deleting h if h=hpy(), and
the other objects you created. Also, it allows for several parallel
invocations of Heapy.

However, I am aware of the extra initial overhead to do h=hpy(). I
discussed this in my thesis. "Section 4.7.8 Why not importing Use
directly?" page 36, 

http://guppy-pe.sourceforge.net/heapy-thesis.pdf

Maybe a module should be added that does this, especially if someone
provides a patch and/or others agree :-)

> (oh, and reading the code for guppy.heapy.Use and its ilk made me go 
> temporarily blind!) ;-)

Try sunglasses:) (Well, I am aware of this, it was a
research/experimental system and could have some refactoring :-)

>  >>> f = open('copy.hpy')
>  >>> s = h.load(f)
> 
> Less minor rant: this applies to most things to do with heapy... Having 
> __repr__ return the same as __str__ and having that be a long lump of 
> text is rather annoying. If you really must, make __str__ return the big 
> lump of text but have __repr__ return a simple, short, item containing 
> the class, the id, and maybe the number of contained objects...

I thought it was cool to not have to use print but get the result
directly at the prompt.

But if this is a problem and especially if others also complain, we
could add an option for shorter __repr__.

h=hpy(short_repr=True)

Or something else/shorter if you wish.

BTW, I think a cool thing with having everything based on a context
session, h=hpy(args), is that you could add any options there. That
would be harder/less clean if you just imported all methods from a
module. Patch some module-level variable.... ilk ...

> Anyway...
> 
>  >>> id(s)
> 13905272
>  >>> len(s)
> 192
>  >>> s.__class__
> <class guppy.heapy.Part.Stat at 0x00CD6A20>
>  >>> i = s[0]
>  >>> id(i)
> 13904112
>  >>> len(i)
> 1
>  >>> i.__class__
> <class guppy.heapy.Part.Stat at 0x00CD6A20>
> 
> Hmmm, I'm sure there's a good reason why an item in a set has the exact 
> same class and iterface as a whole set?

Um, perhaps no very good reason but... a subset of a set is still a set,
isn't it? This is the same structure that is used in IdentitySet
objects. Each row is still an IdentitySet, and has the same attributes.
This is also like Python strings work, there is no special character
type, a character is just a string of length 1. I thought this was
pretty cool when I first saw it in Python, compared to other languages
as C or Pascal. If we don't need a new type, we could better avoid it.

So what's the problem? :-)

> It feels like some kind of filtering, where are the docs that explain 
> all this?

Unfortunately, the docs for the Stat object have been lagging behind.
Sorry. But as people gain more interest for Heapy and send comments or
even patches, I get more motivated to look into it. :-)

Thanks and Cheers,

Sverker

-- 
Expertise in Linux, embedded systems, image processing, C, Python...
        http://sncs.se