pdf2txt

Benjamin Niemann b.niemann at betternet.de
Fri May 28 03:43:41 EDT 2004


B P wrote:
> Is there a way via Python or even Perl to capture records from a pdf and 
>     output a delimited text file?  My work has a situation with a trunk 
> load of data forms that were scanned as pdfs.
> 
> The data needs to be taken from the forms and moved into a database, so 
> I figure that comma-delimited format will work fine.  The amount of 
> man-hours it would take to manually do this is very cost-prohibitive for 
> what we have to work with.
> 
> I know that a txt2pdf exists, was checking to see if the opposite would 
> as well.
> 
> BP
Have a look at pdftext, part of xpdf 
(http://www.foolabs.com/xpdf/home.html). This will convert the pdf into 
plaintext format. You will probably have to parse this plaintext to 
convert it into somesthing useful.



More information about the Python-list mailing list