Python, Perl & PDF files
rbt
rbt at athop1.ath.vt.edu
Tue Apr 26 20:30:08 EDT 2005
Cameron Laird wrote:
> In article <d4m9hl$8br$1 at solaris.cc.vt.edu>,
> rbt <rbt at athop1.ath.vt.edu> wrote:
> .
> .
> .
>
>>Read and search them for strings. If I could do that on windows, linux
>>and mac with the *same* bit of Python code, I'd be very happy ;)
>
>
> Textual content, right? Without regard to font funniness, or
> whether the string is in or out of a table, and so on?
That's right. More specifically, I've written a script that uses a RE to search
through documents for social security numbers. You can see it here:
http://filebox.vt.edu/users/rtilley/public/find_ssns/find_ssns.html
This works on Word, Excel, html, rtf or any ANSI based text. I need the ability to
read and make sense of PDF files as well so I can apply the RE to their content. It's
been frustrating to say the least. Nothing at all against Python... mostly just sick
of hearing about the 'Portable' document format that isn't string or RE searchable...
at least not easily anyway.
> 'Might be a few days before I answer; I'm crashing into end-of-
> the-month deadlines.
No problem. Thanks for the help.
More information about the Python-list
mailing list