Extracting data from HTML

Kragen Sitaker kragen at pobox.com
Sun Jun 2 23:41:04 EDT 2002


Geoff Gerrietts <geoff at gerrietts.net> writes:
> Both techniques are worth knowing -- but better than either would be
> finding a way to get the information you're after via XML-RPC or some
> other protocol that's designed to carry data rather than rendering
> instructions.

You seem to imply that XML-RPC is better suited to carrying data
rather than rendering instructions than HTTP is.  I disagree with this
implication, and I adduce the following evidence:
- the thousands of RSS feeds (see www.syndic8.com) using HTTP
- people downloading Python via HTTP
- the fact that XML-RPC runs over HTTP

XML-RPC is better suited to expressing RPC than HTTP is, but "getting
some data" is probably better done over HTTP GET, where you can take
advantage of things like caching and URL linkability.

There's a *reason* we added MIME headers in HTTP 1.0 about ten years
ago, boy.




More information about the Python-list mailing list