get links?

Alex Martelli aleaxit at yahoo.com
Fri May 4 05:24:48 EDT 2001


"Martin Johansson" <045521104 at telia.com> wrote in message
news:OFtI6.8333$sk3.2324200 at newsb.telia.net...
> This is my code for saving all the links on one page in a textfile, and
> later I will get all these linked pages.
> I just started to programing i python so I can´t see what is wrong.
> Can anybody help me..
    ...
> def lista(s):
>     while s != '</HTML>':
>         if s == '<A HREF="' or s == '<a href="':
>                 while s != '">':
>                     c=open('f.txt', 'a')
>                     c.write(s)

Nowhere is s being modified (re-bound) in this function.  It
will thus keep referring to the same, identical object it
referred to at the start of the loop: so, if the loop ever
executes, it will never terminate.

More generally, this is not anywhere like the approach I
would suggest for the task you set yourself.  Standard
module htmllib does the job admirably, and reusing good
solid code is a good thing.  If you don't want to reuse
it because you want to learn how else the job could be
done, that's fine, too, of course.  But I suspect you
may need better understanding of elementary string stuff
before you re-code such reasonably-ambitious programs.

Operators != and == test for strings being entirely
equal, or different.  You appear to desire something
else, such as testing if a strings s *STARTS WITH*
some other string.  If this is indeed the case, then
you might want to code, e.g.:
    if s.startswith('<A HREF="'):
etc.  Then, if you want to change s so that it now
refers to the string *AFTER* that start, you will
need to add an assignment statement to rebind the
variable s to refer to a different string object:
        s = s[len('<A HREF="'):]
for example.  I hope these hints may help you move
your project forwards, if you want to proceed with
it...!


Alex






More information about the Python-list mailing list