[PSF-Community] Python library to extract data tables from PDF files

David Mertz mertz at gnosis.cx
Fri Sep 28 15:05:05 EDT 2018


Have you compared your tool with existing ones, such as
https://blog.chezo.uno/tabula-py-extract-table-from-pdf-into-python-dataframe-6c7acfa5f302
?

What notable difference in API and/or accuracy do you have?

On Fri, Sep 28, 2018 at 2:32 PM Vinayak Mehta <vmehta94 at gmail.com> wrote:

> I've created a Jupyter notebook which shows an example of how Camelot makes
> it easy to extract tables out of PDFs.
>
>
> In the example, I scrape a PDF from an Indian disease outbreaks data source[1] using requests, extract tables from
> each page of the PDF using Camelot and then concat those tables. Here's the gist!https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873 :)
>
> [1] http://idsp.nic.in/index4.php?lang=1&level=0&linkid=406&lid=3689
>
>
> On Fri, Sep 28, 2018 at 12:01 PM Vinayak Mehta <vmehta94 at gmail.com> wrote:
>
>> Hello everyone!
>>
>> I recently released a Python library which lets users extract data tables
>> out of PDF files, my first open source library! Here's the link:
>> https://github.com/socialcopsdev/camelot
>>
>> I've created a wiki page
>> <https://github.com/socialcopsdev/camelot/wiki/Comparison-with-other-PDF-Table-Extraction-libraries-and-tools>
>> comparing it to other open source PDF table extraction tools. I'm currently
>> working on porting it to Python3!
>>
>> I would be really grateful if you could check it out and see if its
>> useful to you and give me any feedback that may help me improve it, by
>> replying here, opening an issue or a pull request!
>>
>> Looking forward to hearing from you all!
>>
>> Thanks for your time!
>>
>> Vinayak
>>
> _______________________________________________
> PSF-Community mailing list
> PSF-Community at python.org
> https://mail.python.org/mailman/listinfo/psf-community
>


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/psf-community/attachments/20180928/740027da/attachment.html>


More information about the PSF-Community mailing list