[Chicago] scrape power point

Brian Ray brianhray at gmail.com
Thu Sep 23 17:57:11 CEST 2010





On Sep 23, 2010, at 10:44 AM, Carl Karsten <carl at personnelware.com> wrote:

>> Getting data isn't hard, it's the metadata that's difficult. I have lots of existing (mostly) HTML, Excel Spreadsheets, and Word docs, and Power Point
> 
> Do you have python code to scrape the text from Power Point files?
> 
> I would like to be able to scrape the text from Power Point, Keynote
> and whatever else a presenter might use for PyCon talks.  I am sure
> its a previously solved problem, but it is currently low on my list of
> things to even google.
> 

I recall a Google Docs Hack for this. I think it just takes a URL to them with a location of your PPT. Returns HTML. Then BeautifulSoup it :)


More information about the Chicago mailing list