Help with a Python coding question

Justin Peel peelpy at gmail.com
Wed Jan 5 20:14:28 EST 2011


On Wed, Jan 5, 2011 at 4:45 PM, Emile van Sebille <emile at fenx.com> wrote:

> On 1/5/2011 3:12 PM kanthony at woh.rr.com said...
>
>  I want to use Python to find all "\n" terminated
>> strings in a PDF file, ideally returning string
>> starting addresses.   Anyone willing to help?
>>
>
> pdflines = open(r'c:\shared\python_book_01.pdf').readlines()
> sps = [0]
> for ii in pdflines: sps.append(sps[-1]+len(ii))
>
> Emile
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
Bear in mind that pdf files often have compressed objects in them. If that
is the case, then I would recommend opening the pdf in binary mode and
figuring out how to deflate the correct objects before doing any searching.
PyPDF is a package that might help with this though it could use some
updating.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110105/470662e7/attachment-0001.html>


More information about the Python-list mailing list