pdf2txt

Aurelio Martin amartin at wpsnetwork.com
Fri May 28 03:21:07 EDT 2004


B P wrote:
> Is there a way via Python or even Perl to capture records from a pdf and 
>     output a delimited text file?  My work has a situation with a trunk 
> load of data forms that were scanned as pdfs.
> 
> The data needs to be taken from the forms and moved into a database, so 
> I figure that comma-delimited format will work fine.  The amount of 
> man-hours it would take to manually do this is very cost-prohibitive for 
> what we have to work with.
> 
> I know that a txt2pdf exists, was checking to see if the opposite would 
> as well.
> 
> BP

You may try XPDF

http://www.foolabs.com/xpdf/

They include source code and some utilities like pdfimages of pdftotext. 
Maybe you can call these from Python, or link via a C extension.

Hope this helps

Aurelio



More information about the Python-list mailing list