[PSF-Community] Python library to extract data tables from PDF files

Vinayak Mehta vmehta94 at gmail.com
Fri Sep 28 14:31:41 EDT 2018


I've created a Jupyter notebook which shows an example of how Camelot makes
it easy to extract tables out of PDFs.


In the example, I scrape a PDF from an Indian disease outbreaks data
source[1] using requests, extract tables from
each page of the PDF using Camelot and then concat those tables.
Here's the gist!https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873
:)

[1] http://idsp.nic.in/index4.php?lang=1&level=0&linkid=406&lid=3689


On Fri, Sep 28, 2018 at 12:01 PM Vinayak Mehta <vmehta94 at gmail.com> wrote:

> Hello everyone!
>
> I recently released a Python library which lets users extract data tables
> out of PDF files, my first open source library! Here's the link:
> https://github.com/socialcopsdev/camelot
>
> I've created a wiki page
> <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools>
> comparing it to other open source PDF table extraction tools. I'm currently
> working on porting it to Python3!
>
> I would be really grateful if you could check it out and see if its useful
> to you and give me any feedback that may help me improve it, by replying
> here, opening an issue or a pull request!
>
> Looking forward to hearing from you all!
>
> Thanks for your time!
>
> Vinayak
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/psf-community/attachments/20180929/b68244c0/attachment.html>


More information about the PSF-Community mailing list