[Tutor] PDF Scrapping

Wed Nov 25 08:41:31 EST 2015

Oh, I forgot to mention that I am using Python 3.4. Thanks again for your
help pointing me in the right direction.

~Chris

On Tue, Nov 24, 2015 at 1:36 PM, Python Beginner <
pythonbeginner004 at gmail.com> wrote:

> Hi,
>
> I am looking for the best way to scrape the following PDF's:
>
> (1)
> http://minerals.usgs.gov/minerals/pubs/commodity/gold/mcs-2015-gold.pdf
> (table on page 1)
>
> (2)
> http://minerals.usgs.gov/minerals/pubs/commodity/gold/myb1-2013-gold.pdf
> (table 1)
>
> I have done a lot of research and have read that pdftables 0.0.4 is an
> excellent way to scrape tabular data from PDF'S (see
> https://blog.scraperwiki.com/2013/07/pdftables-a-python-library-for-getting-tables-out-of-pdf-files/
> ).
>
> I downloaded pdftables 0.0.4 (see https://pypi.python.org/pypi/pdftables).
>
> I am new to Python and having trouble finding good documentation for how
> to use this library.
>
> Has anybody used pdftables before that could help me get started or point
> me to the ideal library for scrapping the PDF links above? I have read that
> different PDF libraries are used depending on the format of the PDF. What
> library would be best for the PDF formats above? Knowing this will help me
> get started, then I can write up some code and ask further questions if
> needed.
>
> Thanks in advance for your help!
>
> ~Chris
>