searching pdf files for certain info

rbt rbt at athop1.ath.vt.edu
Tue Feb 22 20:07:50 EST 2005


Tom Willis wrote:
> Well sporadic spaces in strings would cause problems would it not?
> 
> an example....
> 
> 
> The String: "Patient Face Sheet"--->pdftotext--->"P a tie n t Face Sheet"
> 
> I'm just curious if you see anything like that, since I really have no
> clue about ps or pdf etc...but I have a strong desire to replace a
> really flaky commercial tool. And if I can do it with free stuff, all
> the better my boss will love me.

No, I do not see that type of behavior. I'm looking for strings that 
resemble SS numbers. So my strings look like this: nnn-nn-nnnn.

The ps2ascii util in ghostscript reproduces strings in the format that I 
expect. BTW, I'm not using pdftotext. I'm using *ps2ascii*.



More information about the Python-list mailing list