Read PDF content

William Purcell williamhpurcell at gmail.com
Thu Aug 21 09:47:24 EDT 2008


Sorry, this last email was meant to be to the list.

On Thu, Aug 21, 2008 at 8:41 AM, William Purcell
<williamhpurcell at gmail.com>wrote:

> I have been trying to do the same thing. Here is something I came up with,
> although it's not completely dependent on Python. It requires pdftotext to
> be installed. If your on a linux box, I think it comes in xpdf-utils but I'm
> not comletely sure. Anyway, install pdftotext and then you could use this
> function:
>
> ----------------------------------------------------------------------------
> import os
>
> def readpdf(filepath):
>     cmd = 'pdftotext -layout %s -'%(filepath,)
>     lines=os.popen(cmd).readlines()
>     return lines
>
> ----------------------------------------------------------------------------
> I would like to find something totally Python, but this has worked for me
> in a pinch.
> -Bill
>
>
> On Thu, Aug 21, 2008 at 5:00 AM, AON LAZIO <aonlazio at gmail.com> wrote:
>
>> Hi, Guys.
>>       I am trying to extract the PDF file content(to get the specific
>> information) using python. I already tried pyPdf with no success.
>>       Anyone has suggestions?
>>       Thanks in advance.
>>
>> Aonlazio
>>
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20080821/78c7eeb1/attachment.html>


More information about the Python-list mailing list