[Tutor] Need help to convert pdf to excel

Danny Yoo dyoo at hashcollision.org
Sun Oct 19 23:59:49 CEST 2014


On Sun, Oct 19, 2014 at 7:27 AM, AMNA MOHAMMED ALRUHEILI
<alruheili at berkeley.edu> wrote:
> My name is Amna and I am totally new to python world with zero experience in
> programming. I am facing the challenge of converting data from pdf to excel.
> The data within pdf is numbers separated by space not within a table.
> I need a help to figure out a code that help me to convert these pdf to
> excel.

Can you get the original data in a format that is not PDF?  PDF is a
human-centric published format: its concerns are typographic.  It is
not intended primarily for computers to extract data.  Certainly you
can extract data from it, but it won't be structured in a way that
makes it particularly easy.

That being said: you might try an approach that gets text from a PDF:

    http://stackoverflow.com/questions/25665/python-module-for-converting-pdf-to-text

and then write some processing program that takes that text and
inserts into an excel spreadsheet.  There are several libraries that
other programmers have written to write Excel documents from Python.
For example:

    https://xlsxwriter.readthedocs.org/



As a side comment: this task actually does require a significant bit
of programming knowledge.  I believe it would be an unfair and
unreasonable request to ask a complete programming beginner to do this
without some basic programming training.  I would strongly recommend
taking an introductory programming class first, before trying to
tackle your PDF->Excel problem head on.


More information about the Tutor mailing list