Implementing file reading in C/Python

MRAB google at mrabarnett.plus.com
Fri Jan 9 10:34:17 EST 2009


Marc 'BlackJack' Rintsch wrote:
> On Fri, 09 Jan 2009 04:04:41 +0100, Johannes Bauer wrote:
> 
>> As this was horribly slow (20 Minutes for a 2GB file) I coded the whole
>> thing in C also:
> 
> Yours took ~37 minutes for 2 GiB here.  This "just" ~15 minutes:
> 
> #!/usr/bin/env python
> from __future__ import division, with_statement
> import os
> import sys
> from collections import defaultdict
> from functools import partial
> from itertools import imap
> 
> 
> def iter_max_values(blocks, block_count):
>     for i, block in enumerate(blocks):
>         histogram = defaultdict(int)
>         for byte in block:
>             histogram[byte] += 1
>         
>         yield max((count, byte)
>                   for value, count in histogram.iteritems())[1]
>         
[snip]
Would it be faster if histogram was a list initialised to [0] * 256?



More information about the Python-list mailing list