Looking Python script to compare two files

Tim Golden tim.golden at viacom-outdoor.co.uk
Wed Nov 9 04:19:24 EST 2005


[yys2000]

> I want to compare two PDF or WORD files. 

Could you be more precise, please?

+ Do you only want to compare PDF-PDF or Word-Word? Or do
  you want to be able to do PDF-Word?

+ In either case, are you only bothered about the text, or
  is the formatting significant?

+ If it's only text, then use whatever method you want to
  extract the text (antiword, ghostscript, COM automation,
  xpdf, etc.) and then use the difflib module, or some external
  diff tool.

+ If you want a structure/format comparison, you're into quite
  difficult territory, I believe. It's easy enough to convert a
  Word Doc to PDF if that were needed but PDFs are notoriously 
  difficult to disentangle, altho' relatively straightforward to 
  build. There's pdftools 
  (http://www.boddie.org.uk/david/Projects/Python/pdftools/)
  which I can't say I've tried, but even once you've got the document
  object into Python, I don't imagine it'll be easy to compare.

+ To do Word-Word comparison, there's more hope on the horizon
  (if that's the metaphor I want). Word has built-in comparison
  functionality, and recent versions of TortoiseSVN, for example
  include a script which will automate Word to do the right thing.
  Which is, essentially, one doc, and call its .Compare method
  against the other.

TJG

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________



More information about the Python-list mailing list