Efficient processing of large nuumeric data file

Fri Jan 18 19:59:34 EST 2008

...and just for fun this D code is about 3.2 times faster than the
Psyco version for the same dataset (30% lines with a space):


import std.stdio, std.conv, std.string, std.stream;

int[int] get_hist(string file_name) {
    int[int] hist;

    foreach(string line; new BufferedFile(file_name)) {
        int pos = find(line, ' ');
        if (pos == -1)
            hist[toInt(line)]++;
        else
            hist[toInt(line[0 .. pos])] += toInt(line[pos+1 .. $]);
    }

    return hist;
}

void main(string[] args) {
    writefln( get_hist(args[1]).length );
}


Bye,
bearophile