pdf2txt

Tim Roberts timr at probo.com
Sat May 29 20:16:45 EDT 2004


B P <nature_boyMYPANTS at mindspring.com> wrote:
>
>Is there a way via Python or even Perl to capture records from a pdf and 
>     output a delimited text file?  My work has a situation with a trunk 
>load of data forms that were scanned as pdfs.

SCANNED as PDFs?  Do you mean these were paper forms, filled in using
printed handwriting, then scanned into a TIFF and wrapped up in a PDF?

If so, your job is next to impossible.  You can extract the original
bitmapped image out of the PDF, and from that you MIGHT be able to use an
OCR program to extract the text, but unless the forms were specifically
designed for machine reading, that process tends to be error-prone.  It
might be more efficient to have human beings translate them.
-- 
- Tim Roberts, timr at probo.com
  Providenza & Boekelheide, Inc.



More information about the Python-list mailing list