[Tutor] PDF to text conversion

Emad Nawfal (عماد نوفل) emadnawfal at gmail.com
Tue Apr 21 19:21:21 CEST 2009


On Tue, Apr 21, 2009 at 12:54 PM, bob gailer <bgailer at gmail.com> wrote:

> Robert Berman wrote:
>
>> Hi,
>>
>> I must convert a history file in PDF format that goes from May of 1988 to
>> current date.  Readings are taken twice weekly and consist of the date taken
>> mm/dd/yy and the results appearing as a 10 character numeric + special
>> characters sequence. This is obviously an easy setup for a very small
>> database  application with the date as the key, the result string as the
>> data.
>>
>> My problem is converting the PDF file into a text file which I can then
>> read and process. I do not see any free python libraries having this
>> capacity. I did see a PDFPILOT program for Windows but this application is
>> being developed on Linux and should also run on Windows; so I do not want to
>> incorporate a Windows only application.
>>
>> I do not think i am breaking any new frontiers with this application. Have
>> any of you worked with such a library, or do you know of one or two I can
>> download and work with? Hopefully, they have reasonable documentation.
>>
>
> If this is a one-time conversion just use the save as text feature of adobe
> reader.
>
>
>> My development environment is:
>>
>> Python
>> Linux
>> Ubuntu version 8.10
>>
>>
>> Thanks for any help  you might be able to offer.
>>
>>
>> Robert Berman
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
>>
>
> --
> Bob Gailer
> Chapel Hill NC
> 919-636-4239
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>


I tried pyPdf once, just for fun, and it was nice:
http://pybrary.net/pyPdf/
-- 
لا أعرف مظلوما تواطأ الناس علي هضمه ولا زهدوا في إنصافه كالحقيقة.....محمد
الغزالي
"No victim has ever been more repressed and alienated than the truth"

Emad Soliman Nawfal
Indiana University, Bloomington
--------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090421/7937d708/attachment.htm>


More information about the Tutor mailing list