[Tutor] Request review: A DSL for scraping a web page

Sat Apr 4 11:42:39 CEST 2015

Joe Farro <joe.farro <at> gmail.com> writes:

> 
> Thanks, Peter.
> 
> Peter Otten <__peter__ <at> web.de> writes:
> 
> > Can you give a real-world example where your DSL is significantly cleaner 
> > than the corresponding code using bs4, or lxml.xpath, or lxml.objectify?

Peter, I worked up what I hope is a fairly representative example. It scrapes
metadata from the 10 newest web-scraping questions on stackoverflow.
It's done with bs4 and take.

https://github.com/tiffon/take-examples/tree/master/samples/stackoverflow

I've posted on the bs4 discussion group asking for feedback on the bs4
version to make sure it's up to snuff. (The post is in new-member
purgatory, at the moment.)

In my opinion, the fact that take lacks an ability to define sub-routines is
a brutal deficiency. (As compared to defining functions like
`get_poster_details()` and `get_comment_activity()` in the bs4 version.)

On the bright side, I do like that the indentation of the take templates
semi-reflect the structure of the HTML document. However, the
indentation doesn't (always) reflect the hierarchy of the data being 
generated, which seems more clear.

Feedback is definitely welcome.

Thanks again!