[perl-python] get web page programatically

Xah Lee xah at xahlee.org
Fri Feb 4 14:01:10 EST 2005


# -*- coding: utf-8 -*-
# Python

# suppose you want to fetch a webpage.
from urllib import urlopen
print
urlopen('http://xahlee.org/Periodic_dosage_dir/_p2/russell-lecture.html').read()

# note the line
# from <library_name> import <function_name1,function_name2...>
# it reads the library and import the function name
# to see available functions in a module one can use "dir"
# import urllib; print dir(urllib)

# for more about this module import syntax, see
# http://python.org/doc/tut/node8.html

#---------------------
# sometimes in working with html pages, you need to creat links
# In url, some chars need to be encoded.
# the "quote" function does it. "unquote" function reverses it. Very
nice.

from urllib import quote
print quote("~joe's home page")
print 'http://www.google.com/search?q=' + quote("ménage à trois")
# (rely on the French to teach us interesting words)

# for more about the urllib module, see
# http://python.org/doc/lib/module-urllib.html

----------------------------
in perl, it's messy as usual. Long story short the simplest way is to
use the perl program HEAD or GET in /usr/bin or /usr/local/bin. When
one of the networking module is installed, perl contaminate your bin
dirs with these programs. In the unix shell, try
GET 'http://yahoo.com/'
should do the job. HEAD is similar for http
head. (assuming they are installed.)

if you need more complexty, perl has LWP::Simple and LWP::UserAgent to
begin with. (there are a host of spaghetti others) Both of these needs
to be installed extra. Perhaps consult your sys admin. The last time i
used them was some 2 years ago, so the following code is untested, but
should be it. I don't recall which one can't do what. Your milage may
vary.

use strict;
# use LWP::Simple;
use LWP::UserAgent;
my $ua = new LWP::UserAgent;
$ua->timeout(120);
my $url='http://yahoo.com/';
my $request = new HTTP::Request('GET', $url);
my $response = $ua->request($request);
my $content = $response->content();
print $content;
__END__

# note the above perl code. In many perl codes, they sport the Object
Oriented syntax, often concomitantly with a normal syntax version as
well.

----------------
this post is from the perl-python a-day mailing list. Please see
http://xahlee.org/perl-python/python.html

 Xah
 xah at xahlee.org
 http://xahlee.org/PageTwo_dir/more.html




More information about the Python-list mailing list