From betsy at python.org Tue Sep 4 13:42:38 2018 From: betsy at python.org (Betsy Waliszewski) Date: Tue, 4 Sep 2018 10:42:38 -0700 Subject: [PSF-Community] New PSF Stickers - We need your help! In-Reply-To: References: Message-ID: Hi all, I'd like to move forward with this project, so if anyone else has any ideas, please send in your designs by Monday, September 10. Thanks to everyone who has participated!! I've consolidated the designs we've received so far and attached them. Cheers, Betsy On Sun, Aug 26, 2018 at 2:43 PM, David Mertz wrote: > No problem here from a trademark perspective. I have no opinion on the > specific stickers created outside that concern. > > On Sun, Aug 26, 2018, 3:12 PM Abdur-Rahmaan Janhangeer < > arj.python at gmail.com> wrote: > >> attached is an updated pep8 logo >> >> yours, >> >> On Fri, Aug 17, 2018 at 9:56 PM Betsy Waliszewski >> wrote: >> >>> Hi Pythonistas! >>> >>> The PSF would love your input on sticker designs. We have a new >>> hex-shape sticker [attach pic] and would love to get 5 more designs so that >>> people can collect all 6. >>> >>> Send us your ideas and we'll let the community decide which ones they >>> like best! >>> >>> Cheers, >>> >>> Betsy >>> _______________________________________________ >>> PSF-Community mailing list >>> PSF-Community at python.org >>> https://mail.python.org/mailman/listinfo/psf-community >>> >> >> >> -- >> Abdur-Rahmaan Janhangeer >> https://github.com/abdur-rahmaanj >> Mauritius >> _______________________________________________ >> PSF-Community mailing list >> PSF-Community at python.org >> https://mail.python.org/mailman/listinfo/psf-community >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PSF Sticker Designs.zip Type: application/zip Size: 347221 bytes Desc: not available URL: From rhysyorke at me.com Tue Sep 4 16:43:26 2018 From: rhysyorke at me.com (Rhys Yorke) Date: Tue, 04 Sep 2018 16:43:26 -0400 Subject: [PSF-Community] New PSF Stickers - We need your help! In-Reply-To: References: Message-ID: <298860F8-DE94-4FF6-9BB1-28C9B593EE45@me.com> As an artist/designer is definitely have some idea to contribute! Thanks for the heads up on the deadline. - Rhys Sent from my iPhone > On Sep 4, 2018, at 1:42 PM, Betsy Waliszewski wrote: > > Hi all, > > I'd like to move forward with this project, so if anyone else has any ideas, please send in your designs by Monday, September 10. Thanks to everyone who has participated!! > > I've consolidated the designs we've received so far and attached them. > > Cheers, > > Betsy > >> On Sun, Aug 26, 2018 at 2:43 PM, David Mertz wrote: >> No problem here from a trademark perspective. I have no opinion on the specific stickers created outside that concern. >> >>> On Sun, Aug 26, 2018, 3:12 PM Abdur-Rahmaan Janhangeer wrote: >>> attached is an updated pep8 logo >>> >>> yours, >>> >>>> On Fri, Aug 17, 2018 at 9:56 PM Betsy Waliszewski wrote: >>>> Hi Pythonistas! >>>> >>>> The PSF would love your input on sticker designs. We have a new hex-shape sticker [attach pic] and would love to get 5 more designs so that people can collect all 6. >>>> >>>> Send us your ideas and we'll let the community decide which ones they like best! >>>> >>>> Cheers, >>>> >>>> Betsy >>>> _______________________________________________ >>>> PSF-Community mailing list >>>> PSF-Community at python.org >>>> https://mail.python.org/mailman/listinfo/psf-community >>> >>> >>> -- >>> Abdur-Rahmaan Janhangeer >>> https://github.com/abdur-rahmaanj >>> Mauritius >>> _______________________________________________ >>> PSF-Community mailing list >>> PSF-Community at python.org >>> https://mail.python.org/mailman/listinfo/psf-community > > > _______________________________________________ > PSF-Community mailing list > PSF-Community at python.org > https://mail.python.org/mailman/listinfo/psf-community -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at python.org Thu Sep 6 10:54:20 2018 From: mal at python.org (M.-A. Lemburg) Date: Thu, 6 Sep 2018 16:54:20 +0200 Subject: [PSF-Community] EuroPython 2018: Videos for Friday available Message-ID: We are pleased to announce the third and last batch of cut videos from EuroPython 2018 in Edinburgh, Scotland, UK. * EuroPython 2018 YouTube Playlist * https://www.youtube.com/watch?v=UXSr1OL5JKo&t=0s&index=130&list=PL8uoeex94UhFrNUV2m5MigREebUms39U5 In the last batch, we have included all videos for Friday, July 27 2018, the third conference day. In total, we now have more than 130 videos available for you to watch. All EuroPython videos, including the ones from previous conferences, are available on our EuroPython YouTube Channel. http://europython.tv/ Help spread the word -------------------- Please help us spread this message by sharing it on your social networks as widely as possible. Thank you ! Link to the blog post: https://blog.europython.eu/post/177798571707/europython-2018-videos-for-friday-available Tweet: https://twitter.com/europython/status/1037666146147811328 Enjoy, -- EuroPython 2018 Team https://ep2018.europython.eu/ https://www.europython-society.org/ From betsy at python.org Wed Sep 12 10:54:35 2018 From: betsy at python.org (Betsy Waliszewski) Date: Wed, 12 Sep 2018 07:54:35 -0700 Subject: [PSF-Community] ACTION NEEDED: New PSF Stickers - last chance to submit your design! Message-ID: Hi all, This is your last chance to submit a design before we put them to a vote! We received lots of excellent ideas and suggestions, but only 4 actual designs (and 2 of them were pretty much the same). They don't have to be perfect - hand-drawn is fine. Voting will go live on Monday, September 17, so there are still a few days to get your design in. Cheers, Betsy On Tue, Sep 4, 2018 at 1:43 PM, Rhys Yorke wrote: > As an artist/designer is definitely have some idea to contribute! > > Thanks for the heads up on the deadline. > > - Rhys > > Sent from my iPhone > > On Sep 4, 2018, at 1:42 PM, Betsy Waliszewski wrote: > > Hi all, > > I'd like to move forward with this project, so if anyone else has any > ideas, please send in your designs by Monday, September 10. Thanks to > everyone who has participated!! > > I've consolidated the designs we've received so far and attached them. > > Cheers, > > Betsy > > On Sun, Aug 26, 2018 at 2:43 PM, David Mertz wrote: > >> No problem here from a trademark perspective. I have no opinion on the >> specific stickers created outside that concern. >> >> On Sun, Aug 26, 2018, 3:12 PM Abdur-Rahmaan Janhangeer < >> arj.python at gmail.com> wrote: >> >>> attached is an updated pep8 logo >>> >>> yours, >>> >>> On Fri, Aug 17, 2018 at 9:56 PM Betsy Waliszewski >>> wrote: >>> >>>> Hi Pythonistas! >>>> >>>> The PSF would love your input on sticker designs. We have a new >>>> hex-shape sticker [attach pic] and would love to get 5 more designs so that >>>> people can collect all 6. >>>> >>>> Send us your ideas and we'll let the community decide which ones they >>>> like best! >>>> >>>> Cheers, >>>> >>>> Betsy >>>> _______________________________________________ >>>> PSF-Community mailing list >>>> PSF-Community at python.org >>>> https://mail.python.org/mailman/listinfo/psf-community >>>> >>> >>> >>> -- >>> Abdur-Rahmaan Janhangeer >>> https://github.com/abdur-rahmaanj >>> Mauritius >>> _______________________________________________ >>> PSF-Community mailing list >>> PSF-Community at python.org >>> https://mail.python.org/mailman/listinfo/psf-community >>> >> > > > _______________________________________________ > PSF-Community mailing list > PSF-Community at python.org > https://mail.python.org/mailman/listinfo/psf-community > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at holdenweb.com Wed Sep 12 15:28:53 2018 From: steve at holdenweb.com (Steve Holden) Date: Wed, 12 Sep 2018 20:28:53 +0100 Subject: [PSF-Community] ACTION NEEDED: New PSF Stickers - last chance to submit your design! In-Reply-To: References: Message-ID: > Voting will go live on Monday, September 17, so there are still a few days to get your design in. Woah. Voting? Who? Why? I thought we were just noodling to give a designer some ideas, I didn't realise it was a competition. Personally I have no graphics training, but I work with a professional design team, and I know how much better their ideas can be than mine. Whatever the result of this vote, let's be clear in our minds that our end-product should merely be a starting point for the designer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From betsy at python.org Wed Sep 12 16:00:46 2018 From: betsy at python.org (Betsy Waliszewski) Date: Wed, 12 Sep 2018 13:00:46 -0700 Subject: [PSF-Community] ACTION NEEDED: New PSF Stickers - last chance to submit your design! In-Reply-To: References: Message-ID: Yes, the "winning" design would be a starting point for a designer. Sorry I wasn't clear on that! Betsy On Wed, Sep 12, 2018 at 12:28 PM, Steve Holden wrote: > > Voting will go live on Monday, September 17, so there are still a few > days to get your design in. > > Woah. Voting? Who? Why? I thought we were just noodling to give a designer > some ideas, I didn't realise it was a competition. > > Personally I have no graphics training, but I work with a professional > design team, and I know how much better their ideas can be than mine. > > Whatever the result of this vote, let's be clear in our minds that our > end-product should merely be a starting point for the designer. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at python.org Mon Sep 24 08:56:12 2018 From: mal at python.org (M.-A. Lemburg) Date: Mon, 24 Sep 2018 14:56:12 +0200 Subject: [PSF-Community] EuroPython 2019: Seeking venues Message-ID: Dear EuroPython'istas, We are in preparations of our venue RFP for the EuroPython 2019 edition and are asking for your help in finding the right locations for us to choose from. If you know of a larger venue - hotel or conference center - that can accommodate at least 1400 attendees, please send the venue details to board at europython.eu. We will then make sure to include them in our RFP once we send it out. The more venues we gather to reach out to, the better of a selection process we can guarantee, which in return, will ultimately result in a better conference experience for everybody involved. When sending us venue suggestions, please make sure to provide us with the following: name and URL of the venue, country and city, as well as the contact details of the sales person in charge of inquiries (full name, email and phone). We were planning to start the RFP process in the coming days, so please make sure you send us your recommendations as soon as possible. Help spread the word -------------------- Please help us spread this message by sharing it on your social networks as widely as possible. Thank you ! Link to the blog post: https://blog.europython.eu/post/178407491437/europython-2019-seeking-venues Tweet: https://twitter.com/europython/status/1044130171354316800 Thank you, -- EuroPython Society https://www.europython-society.org/ From dinaldo at gmail.com Mon Sep 24 13:51:35 2018 From: dinaldo at gmail.com (Don Sheu) Date: Mon, 24 Sep 2018 10:51:35 -0700 Subject: [PSF-Community] Crisis Incidents Message-ID: Recently, we as organizers dealt with somebody on Slack in distress. This created a lot of tough decisions on responses. Alan Vezina my fellow founder of PuPPy took a lead. Following some coaching from social workers, he made a call to 911. Wondering if other organizers have anything to share. We didn't build the Ark in this case. But now after the rain trying to backfill with some process so we're better prepared. -- Don Sheu 312.880.9389 - - - - - - - - - - - - - - - - - - My Python user group convenes every month 2nd Wednesdays http://www.meetup.com/PSPPython/events/232708762/ *CONFIDENTIALITY NOTICE*: *The information contained in this message may be protected trade secrets or protected by applicable intellectual property laws of the United States and International agreements. If you believe that it has been sent to you in error, do not read it. Please immediately reply to the sender that you have received the message in error. Then delete it. Thank you.* ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From graffatcolmingov at gmail.com Mon Sep 24 16:00:51 2018 From: graffatcolmingov at gmail.com (Ian Stapleton Cordasco) Date: Mon, 24 Sep 2018 15:00:51 -0500 Subject: [PSF-Community] Crisis Incidents In-Reply-To: References: Message-ID: I would always try to refer them to trained crisis counselors, e.g., CrisisTextline. If they won't reach out, you can text in for coaching on how best to help them Sent from my phone with my typo-happy thumbs. Please excuse my brevity On Mon, Sep 24, 2018, 14:46 Don Sheu wrote: > Recently, we as organizers dealt with somebody on Slack in distress. This > created a lot of tough decisions on responses. Alan Vezina my fellow > founder of PuPPy took a lead. Following some coaching from social workers, > he made a call to 911. > > Wondering if other organizers have anything to share. > > We didn't build the Ark in this case. But now after the rain trying to > backfill with some process so we're better prepared. > > -- > Don Sheu > 312.880.9389 > - - - - - - - - - - - - - - - - - - > > > My Python user group convenes every month 2nd Wednesdays > http://www.meetup.com/PSPPython/events/232708762/ > > > *CONFIDENTIALITY NOTICE*: *The information contained in this message may > be protected trade secrets or protected by applicable intellectual property > laws of the United States and International agreements. If you believe that > it has been sent to you in error, do not read it. Please immediately reply > to the sender that you have received the message in error. Then delete it. > Thank you.* > ? > _______________________________________________ > PSF-Community mailing list > PSF-Community at python.org > https://mail.python.org/mailman/listinfo/psf-community > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at python.org Fri Sep 28 10:09:37 2018 From: mal at python.org (M.-A. Lemburg) Date: Fri, 28 Sep 2018 16:09:37 +0200 Subject: [PSF-Community] EuroPython 2019: RFP for Venues Message-ID: Dear EuroPython'istas, We are happy to announce that we have started the RFP for venues to host the EuroPython 2019 conference. We have sent out the details to almost 40 venues. For more details about the RFP, please see our blog post: https://www.europython-society.org/post/178541594370/europython-2019-rfp-for-venues Many thanks to everyone who had submitted contact details and venue suggestions after our call. We have tried to include all of them in the list of direct recipients. Feel free to forward the blog post to additional suitable venues. Many thanks, -- EuroPython Society Board https://www.europython-society.org/ From vmehta94 at gmail.com Fri Sep 28 02:31:02 2018 From: vmehta94 at gmail.com (Vinayak Mehta) Date: Fri, 28 Sep 2018 12:01:02 +0530 Subject: [PSF-Community] Python library to extract data tables from PDF files Message-ID: Hello everyone! I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot I've created a wiki page comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3! I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request! Looking forward to hearing from you all! Thanks for your time! Vinayak -------------- next part -------------- An HTML attachment was scrubbed... URL: From vmehta94 at gmail.com Fri Sep 28 14:31:41 2018 From: vmehta94 at gmail.com (Vinayak Mehta) Date: Sat, 29 Sep 2018 00:01:41 +0530 Subject: [PSF-Community] Python library to extract data tables from PDF files In-Reply-To: References: Message-ID: I've created a Jupyter notebook which shows an example of how Camelot makes it easy to extract tables out of PDFs. In the example, I scrape a PDF from an Indian disease outbreaks data source[1] using requests, extract tables from each page of the PDF using Camelot and then concat those tables. Here's the gist!https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873 :) [1] http://idsp.nic.in/index4.php?lang=1&level=0&linkid=406&lid=3689 On Fri, Sep 28, 2018 at 12:01 PM Vinayak Mehta wrote: > Hello everyone! > > I recently released a Python library which lets users extract data tables > out of PDF files, my first open source library! Here's the link: > https://github.com/socialcopsdev/camelot > > I've created a wiki page > > comparing it to other open source PDF table extraction tools. I'm currently > working on porting it to Python3! > > I would be really grateful if you could check it out and see if its useful > to you and give me any feedback that may help me improve it, by replying > here, opening an issue or a pull request! > > Looking forward to hearing from you all! > > Thanks for your time! > > Vinayak > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Fri Sep 28 15:05:05 2018 From: mertz at gnosis.cx (David Mertz) Date: Fri, 28 Sep 2018 15:05:05 -0400 Subject: [PSF-Community] Python library to extract data tables from PDF files In-Reply-To: References: Message-ID: Have you compared your tool with existing ones, such as https://blog.chezo.uno/tabula-py-extract-table-from-pdf-into-python-dataframe-6c7acfa5f302 ? What notable difference in API and/or accuracy do you have? On Fri, Sep 28, 2018 at 2:32 PM Vinayak Mehta wrote: > I've created a Jupyter notebook which shows an example of how Camelot makes > it easy to extract tables out of PDFs. > > > In the example, I scrape a PDF from an Indian disease outbreaks data source[1] using requests, extract tables from > each page of the PDF using Camelot and then concat those tables. Here's the gist!https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873 :) > > [1] http://idsp.nic.in/index4.php?lang=1&level=0&linkid=406&lid=3689 > > > On Fri, Sep 28, 2018 at 12:01 PM Vinayak Mehta wrote: > >> Hello everyone! >> >> I recently released a Python library which lets users extract data tables >> out of PDF files, my first open source library! Here's the link: >> https://github.com/socialcopsdev/camelot >> >> I've created a wiki page >> >> comparing it to other open source PDF table extraction tools. I'm currently >> working on porting it to Python3! >> >> I would be really grateful if you could check it out and see if its >> useful to you and give me any feedback that may help me improve it, by >> replying here, opening an issue or a pull request! >> >> Looking forward to hearing from you all! >> >> Thanks for your time! >> >> Vinayak >> > _______________________________________________ > PSF-Community mailing list > PSF-Community at python.org > https://mail.python.org/mailman/listinfo/psf-community > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vmehta94 at gmail.com Fri Sep 28 15:36:54 2018 From: vmehta94 at gmail.com (Vinayak Mehta) Date: Sat, 29 Sep 2018 01:06:54 +0530 Subject: [PSF-Community] Python library to extract data tables from PDF files In-Reply-To: References: Message-ID: Hello David! Yes, I've created a wiki page comparing Camelot with other open source tools and libraries. tabula-py is a wrapper over tabula-java, which is used by Tabula. You can check out the comparison of Camelot with Tabula here . As you can see in the comparison, it outperforms Tabula in almost all cases! While Tabula either gives either good output or fails miserably, Camelot gives you complete control over the extraction process with various configuration parameters! You can check out this section of the README for more information. Camelot also lets you plot various geometries like detected lines, intersections, tables in the PDF to debug and improve table extraction! You can check out this part of the documentation for more information on that. Try it out! Vinayak On Sat, Sep 29, 2018 at 12:34 AM David Mertz wrote: > Have you compared your tool with existing ones, such as > https://blog.chezo.uno/tabula-py-extract-table-from-pdf-into-python-dataframe-6c7acfa5f302 > ? > > What notable difference in API and/or accuracy do you have? > > On Fri, Sep 28, 2018 at 2:32 PM Vinayak Mehta wrote: > >> I've created a Jupyter notebook which shows an example of how Camelot makes >> it easy to extract tables out of PDFs. >> >> >> In the example, I scrape a PDF from an Indian disease outbreaks data source[1] using requests, extract tables from >> each page of the PDF using Camelot and then concat those tables. Here's the gist!https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873 :) >> >> [1] http://idsp.nic.in/index4.php?lang=1&level=0&linkid=406&lid=3689 >> >> >> On Fri, Sep 28, 2018 at 12:01 PM Vinayak Mehta >> wrote: >> >>> Hello everyone! >>> >>> I recently released a Python library which lets users extract data >>> tables out of PDF files, my first open source library! Here's the link: >>> https://github.com/socialcopsdev/camelot >>> >>> I've created a wiki page >>> >>> comparing it to other open source PDF table extraction tools. I'm currently >>> working on porting it to Python3! >>> >>> I would be really grateful if you could check it out and see if its >>> useful to you and give me any feedback that may help me improve it, by >>> replying here, opening an issue or a pull request! >>> >>> Looking forward to hearing from you all! >>> >>> Thanks for your time! >>> >>> Vinayak >>> >> _______________________________________________ >> PSF-Community mailing list >> PSF-Community at python.org >> https://mail.python.org/mailman/listinfo/psf-community >> > > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vmehta94 at gmail.com Fri Sep 28 15:37:56 2018 From: vmehta94 at gmail.com (Vinayak Mehta) Date: Sat, 29 Sep 2018 01:07:56 +0530 Subject: [PSF-Community] Python library to extract data tables from PDF files In-Reply-To: References: Message-ID: The library's API is pretty simple and intuitive too! You can check it out in the README :) On Sat, Sep 29, 2018 at 1:06 AM Vinayak Mehta wrote: > Hello David! > > Yes, I've created a wiki page comparing Camelot with other open source > tools and libraries. tabula-py is a wrapper over tabula-java, which is used > by Tabula. You can check out the comparison of Camelot with Tabula here > . > As you can see in the comparison, it outperforms Tabula in almost all cases! > > While Tabula either gives either good output or fails miserably, Camelot > gives you complete control over the extraction process with various > configuration parameters! You can check out this section of the README > for more > information. Camelot also lets you plot various geometries like detected > lines, intersections, tables in the PDF to debug and improve table > extraction! You can check out this part of the documentation > > for more information on that. > > Try it out! > > Vinayak > > On Sat, Sep 29, 2018 at 12:34 AM David Mertz wrote: > >> Have you compared your tool with existing ones, such as >> https://blog.chezo.uno/tabula-py-extract-table-from-pdf-into-python-dataframe-6c7acfa5f302 >> ? >> >> What notable difference in API and/or accuracy do you have? >> >> On Fri, Sep 28, 2018 at 2:32 PM Vinayak Mehta wrote: >> >>> I've created a Jupyter notebook which shows an example of how Camelot makes >>> it easy to extract tables out of PDFs. >>> >>> >>> In the example, I scrape a PDF from an Indian disease outbreaks data source[1] using requests, extract tables from >>> each page of the PDF using Camelot and then concat those tables. Here's the gist!https://gist.github.com/vinayak-mehta/e5949f7c2410a0e12f25d3682dc9e873 :) >>> >>> [1] http://idsp.nic.in/index4.php?lang=1&level=0&linkid=406&lid=3689 >>> >>> >>> On Fri, Sep 28, 2018 at 12:01 PM Vinayak Mehta >>> wrote: >>> >>>> Hello everyone! >>>> >>>> I recently released a Python library which lets users extract data >>>> tables out of PDF files, my first open source library! Here's the link: >>>> https://github.com/socialcopsdev/camelot >>>> >>>> I've created a wiki page >>>> >>>> comparing it to other open source PDF table extraction tools. I'm currently >>>> working on porting it to Python3! >>>> >>>> I would be really grateful if you could check it out and see if its >>>> useful to you and give me any feedback that may help me improve it, by >>>> replying here, opening an issue or a pull request! >>>> >>>> Looking forward to hearing from you all! >>>> >>>> Thanks for your time! >>>> >>>> Vinayak >>>> >>> _______________________________________________ >>> PSF-Community mailing list >>> PSF-Community at python.org >>> https://mail.python.org/mailman/listinfo/psf-community >>> >> >> >> -- >> Keeping medicines from the bloodstreams of the sick; food >> from the bellies of the hungry; books from the hands of the >> uneducated; technology from the underdeveloped; and putting >> advocates of freedom in prisons. Intellectual property is >> to the 21st century what the slave trade was to the 16th. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vasudevram at gmail.com Fri Sep 28 15:56:24 2018 From: vasudevram at gmail.com (Vasudev Ram) Date: Sat, 29 Sep 2018 01:26:24 +0530 Subject: [PSF-Community] Python library to extract data tables from PDF files In-Reply-To: References: Message-ID: Very interesting, and congrats, Vinayak. As a person interested in both PDF generation [1] and PDF text extraction [2], I'm interested to know what issues you faced w.r.t. accuracy of text extraction and also formatting. [1] I'm the creator of xtopdf, a Python toolkit for PDF generation from other file formats; http://slides.com/vasudevram/xtopdf http://bitbucket.org/vasudevram/xtopdf [2] I worked on a project to extract text from PDF files. It was done using a C library (xpdf), though, not a Python one. However, the text extraction accuracy issues (some of which are technical issues inherent in the PDF format, according to the vendor of xpdf, Glyph and Cog) are language-independent. There were things like characters getting transposed, missing characters, junk characters sometimes, etc. (I also wrote a heuristics program to detect some such issues, but that too could only reject the bad extracts, not make them correct.) So the extraction was not 100% accurate, at least in my project. Also, like I said, that vendor said the issues are inherent in PDF, partly related to it being a canvas-based model, not a text-based one. I'll try to check out your project some time later. Cheers, Vasudev -- vi quickstart: https://gumroad.com/l/vi_quick Web site: https://vasudevram.github.io Blog: https://jugad2.blogspot.com Products: https://gumroad.com/vasudevram > While Tabula either gives either good output or fails miserably, Camelot > gives you complete control over the extraction process with various > configuration parameters! You can check out this section of the README > for more > information. Camelot also lets you plot various geometries like detected > lines, intersections, tables in the PDF to debug and improve table > extraction! You can check out this part of the documentation > > for more information on that. > >>>> Hello everyone! >>>> >>>> I recently released a Python library which lets users extract data >>>> tables out of PDF files, my first open source library! Here's the link: >>>> https://github.com/socialcopsdev/camelot >>>> >>>> I've created a wiki page >>>> >>>> comparing it to other open source PDF table extraction tools. I'm >>>> currently >>>> working on porting it to Python3! >>>> >>>> I would be really grateful if you could check it out and see if its >>>> useful to you and give me any feedback that may help me improve it, by >>>> replying here, opening an issue or a pull request! >>>> >>>> Looking forward to hearing from you all! >>>> >>>> Thanks for your time! >>>> >>>> Vinayak >>>> From alvarojusten at gmail.com Fri Sep 28 16:10:32 2018 From: alvarojusten at gmail.com (=?UTF-8?B?w4FsdmFybyBKdXN0ZW4gW1R1cmljYXNd?=) Date: Fri, 28 Sep 2018 17:10:32 -0300 Subject: [PSF-Community] Python library to extract data tables from PDF files In-Reply-To: References: Message-ID: Hi, Vinayak! Good work, thanks for sharing. :) I'm the creator of the rows library[http://turicas.info/rows] and implemented PDF support early this year (with 3 different strategies) -- it's not released on PyPI yet since I'm fixing some bugs before releasing the next version, but you can try it out by installing: pip install git+https://github.com/turicas/rows.git at feature/plugin-pdf#egg=rows pdfminer.six cached-property It's 100% written in Python and also has a command-line interface (so you can run `rows convert http://example.com/file.pdf newfile.(csv|xls|xlsx|html|sqlite)` or even `rows query "SELECT * FROM table1 WHERE some_condition" http://example.com/file.pdf --output=result.xls`). The idea behind the extraction algorithms is to be flexible, so you can plug your own if you want (depending on how the PDF is created, the objects will be very different and you cannot use the same ordering/grouping strategy). I'm now implementing support to extract tables from images (and also from PDFs with images), but it's probably not going to the next version since I need a better OCR tool. What do you think in joining efforts so we can have better libraries? I'm going to test the PDFs you've cited with my code so we can compare better. Feel free to contact me directly or join the chat at https://gitter.im/turicas/rows Cheers, ?lvaro Justen "Turicas" turicas.info / @turicas (twitter, github, youtube) +55 41 999 311 221 On Fri, Sep 28, 2018 at 11:43 AM Vinayak Mehta wrote: > > Hello everyone! > > I recently released a Python library which lets users extract data tables out of PDF files, my first open source library! Here's the link: https://github.com/socialcopsdev/camelot > > I've created a wiki page comparing it to other open source PDF table extraction tools. I'm currently working on porting it to Python3! > > I would be really grateful if you could check it out and see if its useful to you and give me any feedback that may help me improve it, by replying here, opening an issue or a pull request! > > Looking forward to hearing from you all! > > Thanks for your time! > > Vinayak > _______________________________________________ > PSF-Community mailing list > PSF-Community at python.org > https://mail.python.org/mailman/listinfo/psf-community