[Tutor] Not understanding a bit of code behavior

Bill Allen wallenpb at gmail.com
Tue Jan 25 06:56:47 CET 2011


This is a bit embarrassing, but I have crafted a bit of code that does
EXACTLY what I what, but I am now a bit baffled as to precisely why.  I have
written a function to do a bit of webscraping by following links for a
project at work.  If I leave the code as is, it behaves like it is
recursively passing through the data tree- which is what I want.  However,
if I change it only slightly, it makes only one pass through the top level
data.  What I do not understand is why is ever behaves as if it is recursive
as the function is only called once.

If I comment out_list=[] and let out_list-=part_list be used the following
parses through the whole tree of data as if recursive.  If I use out_list=[]
and comment out_list=part_list, it only processes to top level of the data
tree.

The function is called only once as:  Exploded_BOM_List =
get_BOM(first_num)  in which I pass it a single part number to start with.
The webscraping bit goes to a particular webpage about that part where it
then picks up more part numbers and repeats the process.

So can anyone help me understand why this actually works?  Certainly no
complaints here about it, but I would like to better understand why changes
the behavior so profoundly.  All the print statements are just to I could
follow out the data flow while working on this.  By following the data flow,
I am finding that part_list is actually having values added to it during the
time the function is running.   Problem is, I don't see clearly why that
should be so.

def get_BOM(part_list):
    x=re.compile('part='+'.*?'+'>')
    BOM_List = []

#    out_list = []
    out_list = part_list
    print("called get_BOM")
    pass_num = 0
    for part_num in part_list:
        mypath = "http://xxx.xxx.xxx.xxx/cgi-bin/search/part-url.cgi?part="
+ part_num
        mylines = urllib.urlopen(mypath).readlines()
        print("pass number ", pass_num)
        print(mypath)
        print("PL:",part_list)
        for item in mylines:
            if "http://" in item:
                if "part=" in item:
                    xstring=str(x.findall(item)).strip('"[\'part=>\']"')
                    BOM_List.append(xstring)
                    print("BL:",BOM_List)
        for bom_item in BOM_List:
            if bom_item not in out_list:
                out_list.append(bom_item)
                print("OL:",out_list)
        pass_num += 1
    return(out_list)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110124/1d603204/attachment.html>


More information about the Tutor mailing list