html parser etc help

Jeremy Yallop jeremy at jdyallop.freeserve.co.uk
Tue Jun 11 20:14:46 EDT 2002


* Xah Lee
| i'm learning python in hoping to do switch web programing from perl.
| can anyone show me code snipets for the following problem to get
| me started?

I don't speak Perl, but I'll see what I can do.

| * snipet to traverse a directory and list files by typer or suffix.
:
| # perl samples
| # traverse dir and print out files ending in .html
| use File::Find;
| find(\&wanted, '/home/xah');
| sub wanted {if($_ =~ m/\.html&/){print "$File::Find::name\n";}}

   import os
   def wanted(arg, dirname, fnames):
       for file in fnames:
           if file.endswith('.html'):
               print file
   os.path.walk('/home/xah')

| * sample snipet that parse a given html file and print out all <a href> links.

[There's probably a simpler way to do this one]

   import sgmllib
   class link_printer(sgmllib.SGMLParser):
       def unknown_starttag(self, tag, attrs):
           if tag.lower() == 'a' and attrs[0][0].lower() == 'href':
               print "link : ", attrs[0][1]
   lp = link_printer()
   lp.feed(open('file.html').read())

| * a sample snipet that fetch and print out the content of a given url.
:
| # fetch a given webpage
| use LWP::Simple;
| $content = get('http://xahlee.org/');
| print $content;

    import urllib2
    content = urllib2.urlopen('http://xahlee.org/')
    print content.read()

Hope this helps,

          Jeremy.



More information about the Python-list mailing list