building an index for large text files for fast access
Neil Hodgson
nyamatongwe+thunder at gmail.com
Tue Jul 25 02:53:03 EDT 2006
Yi Xing:
> Since different lines have different size, I
> cannot use seek(). So I am thinking of building an index for the file
> for fast access. Can anybody give me some tips on how to do this in
> Python?
It depends on the size of the files and the amount of memory and
disk you may use. First suggestion would be an in-memory array.array of
64 bit integers made from 2 'I' entries with each 64 bit integer
pointing to the start of a set of n lines. Then to find a particular
line number p you seek to a[p/n] and then read over p%n lines. The
factor 'n' is adjusted to fit your data into memory. If this uses too
much memory or scanning the file to build the index each time uses too
much time then you can use an index file with the same layout instead.
Neil
More information about the Python-list
mailing list