html parser etc help
Jeremy Yallop
jeremy at jdyallop.freeserve.co.uk
Tue Jun 11 20:14:46 EDT 2002
* Xah Lee
| i'm learning python in hoping to do switch web programing from perl.
| can anyone show me code snipets for the following problem to get
| me started?
I don't speak Perl, but I'll see what I can do.
| * snipet to traverse a directory and list files by typer or suffix.
:
| # perl samples
| # traverse dir and print out files ending in .html
| use File::Find;
| find(\&wanted, '/home/xah');
| sub wanted {if($_ =~ m/\.html&/){print "$File::Find::name\n";}}
import os
def wanted(arg, dirname, fnames):
for file in fnames:
if file.endswith('.html'):
print file
os.path.walk('/home/xah')
| * sample snipet that parse a given html file and print out all <a href> links.
[There's probably a simpler way to do this one]
import sgmllib
class link_printer(sgmllib.SGMLParser):
def unknown_starttag(self, tag, attrs):
if tag.lower() == 'a' and attrs[0][0].lower() == 'href':
print "link : ", attrs[0][1]
lp = link_printer()
lp.feed(open('file.html').read())
| * a sample snipet that fetch and print out the content of a given url.
:
| # fetch a given webpage
| use LWP::Simple;
| $content = get('http://xahlee.org/');
| print $content;
import urllib2
content = urllib2.urlopen('http://xahlee.org/')
print content.read()
Hope this helps,
Jeremy.
More information about the Python-list
mailing list