searching pdf files for certain info

rbt rbt at athop1.ath.vt.edu
Tue Feb 22 11:31:16 EST 2005


Tom Willis wrote:
> I tried that for something not python related and I was getting
> sporadic spaces everywhere.
> 
> I am assuming this is not the case in your experience?
> 
> 
> On Tue, 22 Feb 2005 10:45:09 -0500, rbt <rbt at athop1.ath.vt.edu> wrote:
> 
>>Andreas Lobinger wrote:
>>
>>>Aloha,
>>>
>>>rbt wrote:
>>>
>>>
>>>>Thanks guys... what if I convert it to PS via printing it to a file or
>>>>something? Would that make it easier to work with?
>>>
>>>
>>>Not really...
>>>The classical PS Drivers (f.e. Acroread4-Unix print-> ps) simply
>>>define the pdf graphics and text operators as PS commands and
>>>copy the pdf content directly.
>>>
>>>Wishing a happy day
>>>    LOBI
>>
>>I downloaded ghostscript for Win32 and added it to my PATH
>>(C:\gs\gs8.15\lib AND C:\gs\gs8.15\bin). I found that ps2ascii works
>>well on PDF files and it's entirely free.
>>
>>Usage:
>>
>>ps2ascii PDF_file.pdf > ASCII_file.txt
>>
>>However, bundling a 9+ MB package with a 5K script and convincing users
>>to install it is another matter altogether.
>>--
>>http://mail.python.org/mailman/listinfo/python-list
>>
> 
> 
> 

For my purpose, it works fine. I'm searching for certain strings that 
might be in the document... all I need is a readable file. Layout, fonts 
and/or presentation is unimportant to me.



More information about the Python-list mailing list