Implementing file reading in C/Python
Rhamphoryncus
rhamph at gmail.com
Sat Jan 10 02:44:31 EST 2009
On Jan 9, 2:14 pm, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
> On Fri, 09 Jan 2009 15:34:17 +0000, MRAB wrote:
> > Marc 'BlackJack' Rintsch wrote:
>
> >> def iter_max_values(blocks, block_count):
> >> for i, block in enumerate(blocks):
> >> histogram = defaultdict(int)
> >> for byte in block:
> >> histogram[byte] += 1
>
> >> yield max((count, byte)
> >> for value, count in histogram.iteritems())[1]
>
> > [snip]
> > Would it be faster if histogram was a list initialised to [0] * 256?
>
> Don't know. Then for every byte in the 2 GiB we have to call `ord()`.
> Maybe the speedup from the list compensates this, maybe not.
>
> I think that we have to to something with *every* byte of that really
> large file *at Python level* is the main problem here. In C that's just
> some primitive numbers. Python has all the object overhead.
struct's B format might help here. Also, struct.unpack_from could
probably be combined with mmap to avoid copying the input. Not to
mention that the 0..256 ints are all saved and won't be allocated/
deallocated.
More information about the Python-list
mailing list