Implementing file reading in C/Python

Fri Jan 9 17:23:28 EST 2009

On 2009-01-09, Marc 'BlackJack' Rintsch <bj_666 at gmx.net> wrote:
> On Fri, 09 Jan 2009 15:34:17 +0000, MRAB wrote:
>
>> Marc 'BlackJack' Rintsch wrote:
>>
>>> def iter_max_values(blocks, block_count):
>>>     for i, block in enumerate(blocks):
>>>         histogram = defaultdict(int)
>>>         for byte in block:
>>>             histogram[byte] += 1
>>>         
>>>         yield max((count, byte)
>>>                   for value, count in histogram.iteritems())[1]
>>>         
>> [snip]
>> Would it be faster if histogram was a list initialised to [0] * 256?
>
> Don't know.  Then for every byte in the 2??GiB we have to call `ord()`.  
> Maybe the speedup from the list compensates this, maybe not.
>
> I think that we have to to something with *every* byte of that really 
> large file *at Python level* is the main problem here.  In C that's just 
> some primitive numbers.  Python has all the object overhead.

Using buffers or arrays of bytes instead of strings/lists would
probably reduce the overhead quite a bit.

-- 
Grant Edwards                   grante             Yow! I've got an IDEA!!
                                  at               Why don't I STARE at you
                               visi.com            so HARD, you forget your
                                                   SOCIAL SECURITY NUMBER!!