[Tutor] python and Beautiful soup question

Timo timomlists at gmail.com
Mon Jun 22 12:11:30 CEST 2015


Op 21-06-15 om 22:04 schreef Joshua Valdez:
> I'm having trouble making this script work to scrape information from a
> series of Wikipedia articles.
>
> What I'm trying to do is iterate over a series of wiki URLs and pull out
> the page links on a wiki portal category (e.g.
> https://en.wikipedia.org/wiki/Category:Electronic_design).
Instead of scraping the webpage, I'd have a look at the API. This might 
give much better and more reliable results than to rely on parsing HTML.

https://www.mediawiki.org/wiki/API:Main_page

You can try out the huge amount of different options (with small 
descriptions) on the sandbox page:

https://en.wikipedia.org/wiki/Special:ApiSandbox

Timo

>
>
>
>
> *Joshua Valdez*
> *Computational Linguist : Cognitive Scientist
>       *
>
> (440)-231-0479
> jdv12 at case.edu <jdv2 at uw.edu> | jdv2 at uw.edu | joshv at armsandanchors.com
> <http://www.linkedin.com/in/valdezjoshua/>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor



More information about the Tutor mailing list