"groupby" is brilliant!

John Machin sjmachin at lexicon.net
Tue Jun 13 18:47:18 EDT 2006


On 14/06/2006 8:38 AM, Robert Kern wrote:
> Gary Herron wrote:
>> John Machin wrote:
>>
>>> On 13/06/2006 6:28 PM, Paul McGuire wrote:
>>>
>>>> (Oh, and I like groupby too!  Combine it with sort to quickly create
>>>> histograms.)
>>>>
>>>> # tally a histogram of a list of values from 1-10
>>>> dataValueRange = range(1,11)
>>>> data = [random.choice(dataValueRange) for i in xrange(10000)]
>>>>
>>>> hist = [ (k,len(list(g))) for k,g in itertools.groupby(sorted(data)) ]
>>> That len(list(g)) looks like it uses O(N) memory just to find out what N 
>>> is :-(
>> Not at all! A python list *knows* its length at all times. len() is a
>> constant time lookup of an internal attribute.
> 
> The point is that you had to create the list in the first place. g is an iterator.
> 

I didn't have to create a list in the first place. Paul did. The point 
of my post was to avoid the memory grab caused by list(g) by seeking a 
way that just counted g's output.

Sorry for the confusion my lack of clarity has evidently caused. I'll 
rephrase:

That whole construct
     len(list(g))
looks like it uses O(N) memory just to find out what N is.
Better?



More information about the Python-list mailing list