Recursion limit of pickle?
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Mon Feb 11 02:13:01 EST 2008
En Sun, 10 Feb 2008 02:09:12 -0200, Victor Lin <Bornstub at gmail.com>
escribió:
> On 2月10日, 上午11時42分, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
> wrote:
>> En Sat, 09 Feb 2008 09:49:46 -0200, Victor Lin <Borns... at gmail.com>
>> escribió:
>>
>> > I encounter a problem with pickle.
>> > I download a html from:
>>
>> >http://www.amazon.com/Magellan-Maestro-4040-Widescreen-Navigator/dp/B...
>>
>> > and parse it with BeautifulSoup.
>> > This page is very huge.
>> > When I use pickle to dump it, a RuntimeError: maximum recursion depth
>> > exceeded occur.
Yes, I could reproduce the error. Worse, using cPicle instead of pickle,
Python just aborts (no exception trace, no error printed, no Application
Error popup...) (this is with Python 2.5.1 on Windows XP)
<code>
import urllib
import BeautifulSoup
import cPickle
doc =
urllib.urlopen('http://www.amazon.com/Magellan-Maestro-4040-Widescreen-Navigator/dp/B000NMKHW6/ref=sr_1_2?ie=UTF8&s=electronics&qid=1202541889&sr=1-2')
soup = BeautifulSoup.BeautifulSoup(doc)
#print len(cPickle.dumps(soup,-1))
</code>
That page has an insane SELECT containing 1000 OPTIONs. Removing some of
them makes cPickle happy:
<code>
div=soup.find("div", id="buyboxDivId")
select=div.find("select", attrs={"name":"quantity"})
for i in range(200): # remove 200 options out of 1000
select.contents[5].extract()
print len(cPickle.dumps(soup,-1))
</code>
I don't know whether this is an error in BeautifulSoup or in pickle. That
SELECT with many OPTIONs is big, but not recursive (and I think that
BeautifulSoup uses weak references to build its links); anyway pickle is
supposed to handle recursion well. The longest chain of nested tags has
length=32; in principle I would expect that BS has a similar nesting
complexity, and the "recursion limit exceeded" error isn't expected.
>> BeautifulSoup objects usually aren't pickleable, independently of your
>> recursion error.
> But I pickle and unpickle other soup objects successfully.
> Only this object seems too deep to pickle.
Yes, sorry, I was using an older version of BeautifulSoup.
--
Gabriel Genellina
More information about the Python-list
mailing list