get links?
Alex Martelli
aleaxit at yahoo.com
Fri May 4 05:24:48 EDT 2001
"Martin Johansson" <045521104 at telia.com> wrote in message
news:OFtI6.8333$sk3.2324200 at newsb.telia.net...
> This is my code for saving all the links on one page in a textfile, and
> later I will get all these linked pages.
> I just started to programing i python so I can´t see what is wrong.
> Can anybody help me..
...
> def lista(s):
> while s != '</HTML>':
> if s == '<A HREF="' or s == '<a href="':
> while s != '">':
> c=open('f.txt', 'a')
> c.write(s)
Nowhere is s being modified (re-bound) in this function. It
will thus keep referring to the same, identical object it
referred to at the start of the loop: so, if the loop ever
executes, it will never terminate.
More generally, this is not anywhere like the approach I
would suggest for the task you set yourself. Standard
module htmllib does the job admirably, and reusing good
solid code is a good thing. If you don't want to reuse
it because you want to learn how else the job could be
done, that's fine, too, of course. But I suspect you
may need better understanding of elementary string stuff
before you re-code such reasonably-ambitious programs.
Operators != and == test for strings being entirely
equal, or different. You appear to desire something
else, such as testing if a strings s *STARTS WITH*
some other string. If this is indeed the case, then
you might want to code, e.g.:
if s.startswith('<A HREF="'):
etc. Then, if you want to change s so that it now
refers to the string *AFTER* that start, you will
need to add an assignment statement to rebind the
variable s to refer to a different string object:
s = s[len('<A HREF="'):]
for example. I hope these hints may help you move
your project forwards, if you want to proceed with
it...!
Alex
More information about the Python-list
mailing list