building an index for large text files for fast access

John Machin sjmachin at lexicon.net
Tue Jul 25 03:40:48 EDT 2006


Erik Max Francis wrote:
> alex23 wrote:
>
> > The standard library module 'libcache' does exactly what you're
> > considering implementing.
>
> I believe the module you're referring to is `linecache`.
>

and whatever its name is, it's not a goer: it reads the whole of each
file into memory. It was designed for stack traces. See docs. See
source code. See discussion when somebody with a name within a typo or
0 of the OP's asked an almost identical question very recently.

Here's a thought:

sqlite3 database, one table (line_number, file_offset, line_length),
make line_number the primary key, Bob's yer uncle.

Oh yeah, that battery's not included yet -- you'll need to download the
pysqlite2 module, and mutter strange oaths:
import sys
PY_VERSION = sys.version_info[:2]
if PY_VERSION >= (2, 5):
    import sqlite3
else:
    from pysqlite2 import dbapi2 as sqlite3

It would be a very good idea for the OP to give us a clue as to
(min/max/average) number of (bytes per line, lines per file, bytes per
file) *and* the desired response time on what hardware & OS ... *and*
how long if takes to do this:
    for line in file_handle:
        pass

Alternatively, if you have the disk space, you could have just two
columns in the DB table: (line_number, line).

Cheers,
John




More information about the Python-list mailing list