Python, Perl & PDF files

rbt rbt at athop1.ath.vt.edu
Tue Apr 26 20:30:08 EDT 2005


Cameron Laird wrote:
> In article <d4m9hl$8br$1 at solaris.cc.vt.edu>,
> rbt  <rbt at athop1.ath.vt.edu> wrote:
> 			.
> 			.
> 			.
> 
>>Read and search them for strings. If I could do that on windows, linux 
>>and mac with the *same* bit of Python code, I'd be very happy ;)
> 
> 
> Textual content, right?  Without regard to font funniness, or
> whether the string is in or out of a table, and so on?

That's right. More specifically, I've written a script that uses a RE to search 
through documents for social security numbers. You can see it here:

http://filebox.vt.edu/users/rtilley/public/find_ssns/find_ssns.html

This works on Word, Excel, html, rtf or any ANSI based text. I need the ability to 
read and make sense of PDF files as well so I can apply the RE to their content. It's 
been frustrating to say the least. Nothing at all against Python... mostly just sick 
of hearing about the 'Portable' document format that isn't string or RE searchable... 
at least not easily anyway.

> 'Might be a few days before I answer; I'm crashing into end-of-
> the-month deadlines.

No problem. Thanks for the help.



More information about the Python-list mailing list