indexing and searching pdf files

Rajarshi Guha rajarshi at presidency.com
Thu Sep 26 18:05:37 EDT 2002


Hi,
  I have a load of pdf files and I would like to index them so that I can
serach them for keywords. I was thinking of using pdftotext to generate
the textfile and then create the index from that.

My question - is there already something like this with python?
Another question which is slightly off topic is, does anybody know of any
articles/pages that talk about indexing text files efficiently - index
generaion algorithms etc?

Thanks,



More information about the Python-list mailing list