searching pdf files for certain info

Tom Willis tom.willis at gmail.com
Tue Feb 22 19:54:13 EST 2005


Well sporadic spaces in strings would cause problems would it not?

an example....


The String: "Patient Face Sheet"--->pdftotext--->"P a tie n t Face Sheet"

I'm just curious if you see anything like that, since I really have no
clue about ps or pdf etc...but I have a strong desire to replace a
really flaky commercial tool. And if I can do it with free stuff, all
the better my boss will love me.


On Tue, 22 Feb 2005 11:31:16 -0500, rbt <rbt at athop1.ath.vt.edu> wrote:
> Tom Willis wrote:
> > I tried that for something not python related and I was getting
> > sporadic spaces everywhere.
> >
> > I am assuming this is not the case in your experience?
> >
> >
> > On Tue, 22 Feb 2005 10:45:09 -0500, rbt <rbt at athop1.ath.vt.edu> wrote:
> >
> >>Andreas Lobinger wrote:
> >>
> >>>Aloha,
> >>>
> >>>rbt wrote:
> >>>
> >>>
> >>>>Thanks guys... what if I convert it to PS via printing it to a file or
> >>>>something? Would that make it easier to work with?
> >>>
> >>>
> >>>Not really...
> >>>The classical PS Drivers (f.e. Acroread4-Unix print-> ps) simply
> >>>define the pdf graphics and text operators as PS commands and
> >>>copy the pdf content directly.
> >>>
> >>>Wishing a happy day
> >>>    LOBI
> >>
> >>I downloaded ghostscript for Win32 and added it to my PATH
> >>(C:\gs\gs8.15\lib AND C:\gs\gs8.15\bin). I found that ps2ascii works
> >>well on PDF files and it's entirely free.
> >>
> >>Usage:
> >>
> >>ps2ascii PDF_file.pdf > ASCII_file.txt
> >>
> >>However, bundling a 9+ MB package with a 5K script and convincing users
> >>to install it is another matter altogether.
> >>--
> >>http://mail.python.org/mailman/listinfo/python-list
> >>
> >
> >
> >
> 
> For my purpose, it works fine. I'm searching for certain strings that
> might be in the document... all I need is a readable file. Layout, fonts
> and/or presentation is unimportant to me.
> --
> http://mail.python.org/mailman/listinfo/python-list
> 


-- 
Thomas G. Willis
http://paperbackmusic.net



More information about the Python-list mailing list