"groupby" is brilliant!

Paul McGuire ptmcg at austin.rr._bogus_.com
Tue Jun 13 20:21:14 EDT 2006


John Machin" <sjmachin at lexicon.net> wrote in message
news:448F4076.6090407 at lexicon.net...
> On 14/06/2006 8:38 AM, Robert Kern wrote:
> > Gary Herron wrote:
> >> John Machin wrote:
> >>
> >>> On 13/06/2006 6:28 PM, Paul McGuire wrote:
> >>>
> >>>> (Oh, and I like groupby too!  Combine it with sort to quickly create
> >>>> histograms.)
> >>>>
> >>>> # tally a histogram of a list of values from 1-10
> >>>> dataValueRange = range(1,11)
> >>>> data = [random.choice(dataValueRange) for i in xrange(10000)]
> >>>>
> >>>> hist = [ (k,len(list(g))) for k,g in
itertools.groupby(sorted(data)) ]
> >>> That len(list(g)) looks like it uses O(N) memory just to find out what
N
> >>> is :-(
> >> Not at all! A python list *knows* its length at all times. len() is a
> >> constant time lookup of an internal attribute.
> >
> > The point is that you had to create the list in the first place. g is an
iterator.
> >
>
> I didn't have to create a list in the first place. Paul did. The point
> of my post was to avoid the memory grab caused by list(g) by seeking a
> way that just counted g's output.
>
> Sorry for the confusion my lack of clarity has evidently caused. I'll
> rephrase:
>
> That whole construct
>      len(list(g))
> looks like it uses O(N) memory just to find out what N is.
> Better?

Ok, ok!!!  Here's a non-O(N) memory allocation, using a generator expression
to count the number of items in the list.

hist = [ (k,sum(1 for _g in g)) for k,g in itertools.groupby(sorted(data)) ]

-- Paul





More information about the Python-list mailing list