Looking Python script to compare two files

David Boddie davidb at mcs.st-and.ac.uk
Thu Nov 10 06:43:17 EST 2005


Tim Golden wrote:

> + PDF: David Boddie's pdftools looks like about the only possibility:
> (ducks as a thousand people jump on him and point out the alternatives)

I might as well do that! Here are a couple of alternatives:

http://www.sourceforge.net/projects/pdfplayground
http://www.adaptive-enterprises.com.au/~d/software/pdffile/

Both of these are arguably more "Pythonic" than my solution, and
the first is also able to write out modified files.

Cameron Laird also maintains a page about PDF conversion tools:

http://phaseit.net/claird/comp.text.pdf/PDF_converters.html

> http://www.boddie.org.uk/david/Projects/Python/pdftools/
>
> Something like this might do the business. I'm afraid I've
> no idea how to determine where the line-breaks are. This
> was the first time I'd used pdftools, and the fact that
> I could do this much is a credit to its usability!

Thanks for the compliment! The read_text method in the PDFContents
class also lets you extract text from a given page in a document, but
you have to remember that text in PDF files isn't always composed as
a series of lines or paragraphs, and often doesn't even contain
whitespace characters.

David




More information about the Python-list mailing list