interacting with an updatedb generated data file within python

John Machin sjmachin at lexicon.net
Fri Apr 3 03:25:03 EDT 2009


On Apr 3, 1:07 pm, birdsong <david.birds... at gmail.com> wrote:
> Does anybody have any recommendations on how to interact with the data
> file that updatedb generates?  I'm running through a file list in
> sqlite that I want to check against the file system. updatedb is
> pretty optimized for building an index and storing it, but I see no
> way to query the db file other than calling locate itself.  This would
> require me to fork and exec for every single file I want to verify -
> I'd be better off doing the stat myself in that case, but I'd really
> rather let updatedb build the index for me.
>
> I searched high and low for any sort of library that is well suited
> for reading these data files, but I've found nothing for any language
> other than the source for locate and updatedb itself.

Disclaimer: I had to google to find out what "updatedb" is so don't
take me as any authority on this :-)

The format appears to be documented e.g.
http://www.delorie.com/gnu/docs/findutils/locatedb.5.html
and thus should be found on the locatedb(5) man page on your system.

Assuming that you don't have the old version, it should take about 20
lines of Python to loop around extracting the file names, plus some
more to open the file, read it in as one big string (how big is it?),
and check the dummy "LOCATE02" entry up the front -- it's a bit hard
to be sure how the prefix length of the first non-dummy entry is
determined without seeing an actual example, but my guess is that the
file will start like this:

"\x00LOCATE02\x00\xF8name-of-first-file-in-full\x00........."
where the "\xF8" is -8 meaning ignore the 8-character previous name
"LOCATE02" i.e. previous name can be regarded as "".

Anyway, I reckon utter max 50 lines of Python to produce a module with
a generator that yields one file name at a time, or a function
returning a list or a set.

HTH ... feel free to ask more if the above is a little obscure. But do
accompany any questions with the result of doing this:
   print repr(open('the_locatedb_file').read(400))
plus what you believe the full name of the first non-dummy file should
be.

Cheers,
John



More information about the Python-list mailing list