[BangPypers] Retrieving images from PDFs

Wed Dec 30 08:02:44 CET 2009

On Tue, Dec 29, 2009 at 5:51 PM, Srinivas Reddy Thatiparthy <
srinivas_thatiparthy at akebonosoft.com> wrote:

> Though I never tried it, Reportlab has a pdf library (If I am not
> wrong).
> Have you tried it??
>
>
> When the limestone of imperative programming is worn away, the granite
> of functional programming will be observed. ---Simon Peyton Jones.
>
>
> -----Original Message-----
> From: bangpypers-bounces+srinivas_thatiparthy=akebonosoft.com at python.org
> [mailto:bangpypers-bounces+srinivas_thatiparthy<bangpypers-bounces%2Bsrinivas_thatiparthy>
> =akebonosoft.com at python.o
> rg] On Behalf Of Shashwat Anand
> Sent: Tuesday, December 29, 2009 5:49 PM
> To: Bangalore Python Users Group - India
> Subject: [BangPypers] Retrieving images from PDFs
>
> How can we retrieve images from PDFs. I need both images and the text
> beneath the image to form a database. I was able to parse text via
> PDFMiner but was crippled when it leads to images.
>

Use pyPdf. I have worked a bit on PDF structure using pyPdf and written a
library
on top of it for PDF accessibility so I can help you to do this. If you want

help contact me off list.

> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>

-- 
--Anand