[Tutor] finding title tag (fwd)

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Tue, 16 Oct 2001 18:52:20 -0700 (PDT)


On Tue, 16 Oct 2001, Danny Yoo wrote:

> I'm slightly busy at the moment, but I'll be able to answer your question
> tonight.  I'm forwarding this to the other tutors on the mailing list, so
> that someone has a chance to answer you.  Best of wishes!

Ok, I'm back.


> thanx a lot for the help...
> i am confused with the use of class ....as i want to find links at the
> depth of 3 or more how can i recursively call this method.....

How familiar are you with recursion?  If you tell us more about your
background, that will help allow us to better present things for you.  
Because recursion is an advanced topic, I don't know exactly how to
effectively talk about this.


It might help to think of findlinks() as a function that takes in a single
url, and spits back a list of new urls.  If trying to solve this problem
recursively is difficult, let's go for the easy route: let's try to solve
the problem without recursion.  *grin*


For example, we already know how to get the links at "depth 0" if we have
an initial url we want to look at:

###
links_at_depth_0 = findLinks('http://python.org')
###

To get links at "depth 1", we can simply call findlinks() on each link
that's at "depth 0", and collect all these links together:

###
links_at_depth_1 = []
for link in links_at_depth_0:
    links_at_depth_1 = links_at_depth_1 + findLinks(link)
###


To get to links at "depth 2", we can simply call findlinks() on each link
that's at "depth 1", and collect all these links together:

###
links_at_depth_2 = []
for link in links_at_depth_1:
    links_at_depth_2 = links_at_depth_2 + findLinks(link)
###


Let's stop here.  *grin* A recursive solution can capture this sort of
link-grabbing as deeply as we want.  But if you just want it for depth 3,
you already have enough tools to do this.  It will be a little wordy,
true, but it will also be easy to understand.

We can talk more about this if you want.  Please feel free to ask more
questions about this.



> also in following soln of yours it will separate links and title for that
> links ..so i will not be able to keep track of which title belongs to
> which link

The links and titles are intentionally paired up in a way so that if we're
looking at a link a position 'n' in the anchorlist, its corresponding
title will also be at position 'n' in the titlelist.  Play around with it
a little more.


> ....and also if there's no title then i have to get the text of the
> link

Not too difficult; that will require a small change to the AnchorParser.  
How much of the AnchorParser class makes sense to you now?