Read and extract text from pdf

Rene Pijlman reply.in.the.newsgroup at my.address.is.invalid
Fri Apr 21 08:50:32 EDT 2006


Julien ARNOUX:
>I have a problem :), I just want to extract text from pdf file with
>python. There is differents libraries for that but it doesn't work...
>
>pyPdf and  pdfTools, I don't know why but it doesn't works with some
>pdf...

Text can be represented in different ways in PDF: as tagged text, bitmap
and vector images, and even algorithms (IIRC). Most tools will only be
able to retrieve text represented as tagged text. So some tools may work
on some texts in some files and fail on others.

-- 
René Pijlman

Wat wil jij leren?  http://www.leren.nl



More information about the Python-list mailing list