Implementing file reading in C/Python
Grant Edwards
invalid at invalid
Fri Jan 9 17:23:28 EST 2009
On 2009-01-09, Marc 'BlackJack' Rintsch <bj_666 at gmx.net> wrote:
> On Fri, 09 Jan 2009 15:34:17 +0000, MRAB wrote:
>
>> Marc 'BlackJack' Rintsch wrote:
>>
>>> def iter_max_values(blocks, block_count):
>>> for i, block in enumerate(blocks):
>>> histogram = defaultdict(int)
>>> for byte in block:
>>> histogram[byte] += 1
>>>
>>> yield max((count, byte)
>>> for value, count in histogram.iteritems())[1]
>>>
>> [snip]
>> Would it be faster if histogram was a list initialised to [0] * 256?
>
> Don't know. Then for every byte in the 2??GiB we have to call `ord()`.
> Maybe the speedup from the list compensates this, maybe not.
>
> I think that we have to to something with *every* byte of that really
> large file *at Python level* is the main problem here. In C that's just
> some primitive numbers. Python has all the object overhead.
Using buffers or arrays of bytes instead of strings/lists would
probably reduce the overhead quite a bit.
--
Grant Edwards grante Yow! I've got an IDEA!!
at Why don't I STARE at you
visi.com so HARD, you forget your
SOCIAL SECURITY NUMBER!!
More information about the Python-list
mailing list