which datastructure for fast sorted insert?
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Sun May 25 22:25:33 EDT 2008
En Sun, 25 May 2008 22:42:06 -0300, <notnorwegian at yahoo.se> escribió:
> def joinSets(set1, set2):
> for i in set2:
> set1.add(i)
> return set1
Use the | operator, or |=
> Traceback (most recent call last):
> File "C:/Python25/Progs/WebCrawler/spider2.py", line 47, in <module>
> x = scrapeSites("http://www.yahoo.com")
> File "C:/Python25/Progs/WebCrawler/spider2.py", line 31, in
> scrapeSites
> site = iterator.next()
> RuntimeError: Set changed size during iteration
You will need two sets: the one you're iterating over, and another collecting new urls. Once you finish iterating the first, continue with the new ones; stop when it's empty.
> def scrapeSites(startAddress):
> site = startAddress
> sites = set()
> iterator = iter(sites)
> pos = 0
> while pos < 10:#len(sites):
> newsites = scrapeSite(site)
> joinSets(sites, newsites)
> pos += 1
> site = iterator.next()
> return sites
Try this (untested):
def scrapeSites(startAddress):
allsites = set() # all links found so far
pending = set([startAddress]) # pending sites to examine
while pending:
newsites = set() # new links
for site in pending:
newsites |= scrapeSite(site)
pending = newsites - allsites
allsites |= newsites
return allsites
> wtf? im not multithreading or anything so how can the size change here?
You modified the set you were iterating over. Another example of the same problem:
d = {'a': 1, 'b': 2, 'c':3}
for key in d:
d[key+key]=0
--
Gabriel Genellina
More information about the Python-list
mailing list