[ python-Bugs-1757062 ] Pickle fails on BeautifulSoup's navigableString instances

SourceForge.net noreply at sourceforge.net
Thu Aug 23 16:02:02 CEST 2007


Bugs item #1757062, was opened at 2007-07-19 20:23
Message generated for change (Comment added) made by altherac
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1757062&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Tal Einat (taleinat)
Assigned to: Nobody/Anonymous (nobody)
Summary: Pickle fails on BeautifulSoup's navigableString instances

Initial Comment:
Trying to pickle an instance of BeautifulSoup's NavigableString class, this is the result:
"RuntimeError: maximum recursion depth exceeded"


Diagnosis: The problem arises when trying to pickle such instances - pickle enters an endless loop and reaches the max recursion limit (eventually). This happens regardless of the protocol used.

Possibly related to SF bug #1581183: "pickle protocol 2 failure on int subclass"
http://sourceforge.net/tracker/index.php?funchttp://sourceforge.net/tracker/index.php?func=detail&aid=1581183&group_id=5470&atid=105470=detail&aid=1512695&group_id=5470&atid=105470


See http://mail.python.org/pipermail/idle-dev/2007-July/002600.html (originally a bug report for IDLE on the IDLE-dev list) for details (including how to recreate the error).

Related IDLE bug report: #1757057
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1757057&group_id=5470


----------------------------------------------------------------------

Comment By: Christophe Michel (altherac)
Date: 2007-08-23 16:02

Message:
Logged In: YES 
user_id=562686
Originator: NO

Let's use the following sample code. It's the most minimalistic one, and
isolates the cause of the bug.

----------8<----------8<----------

#!/usr/bin/env python

import pickle, sys

class EvilString(unicode):
    def __unicode__(self):
        return self

n = EvilString("")
pickle.dump(n, sys.stdout)

----------8<----------8<----------

The evil recursion proceeds as follows :

>>  File "C:\Python25\lib\pickle.py", line 1364, in dump
>>    Pickler(file, protocol).dump(obj)

Initial call to dump(), as intended.
    
>>  File "C:\Python25\lib\pickle.py", line 224, in dump
>>    self.save(obj)

save() calls obj.__reduce_ex(), obj being our EvilString instance.

This function is defined in copyreg.py, line 58 and following my example,
returns a tuple containing three elements:
1) the _reconstructor function, as defined in copyreg.py, line 46
2) a tuple : (<class '__main__.EvilString'>, <type 'unicode'>,
<'__main__.EvilString' instance at 0xXXXXXXXX>)
   First element is the actual class of obj, second is the base class, and
third is the current instance (known as state).

3) an empty dict {}

>>  File "C:\Python25\lib\pickle.py", line 331, in save
>>    self.save_reduce(obj=obj, *rv)

save_reduce() calls self.save() twice:
- first on the func argument, which is the _reconstructor function. This
call works as intended
- next on the tuple (<class '__main__.EvilString'>, <type 'unicode'>,
<'__main__.EvilString' instance at 0xXXXXXXXX>)

>>  File "C:\Python25\lib\pickle.py", line 403, in save_reduce
>>    save(args)
>>  File "C:\Python25\lib\pickle.py", line 286, in save
>>    f(self, obj) # Call unbound method with explicit self

save() finds out its argument is a Tuple, and calls save_tuple()
appropriately

>>  File "C:\Python25\lib\pickle.py", line 564, in save_tuple
>>    save(element)

... and save_tuple() calls save() on each element of the tuple.
See what's wrong ?
This means calling save() again on the EvilString instance. Which, in
turn, will call save_reduce() on it, and so on.

The problem lies in _reduce_ex(), in the definition of the state of the
object:

copyreg.py, lines 65 to 70:
    if base is object:
        state = None
    else:
        if base is self.__class__:
            raise TypeError, "can't pickle %s objects" % base.__name__
        state = base(self)

When this code gets executed on an EvilString instance, base is the type
'unicode'.
Since it's not an object, and since it's not the actual class EvilString
either, the following line gets executed:
state=base(self)

Which corresponds to unicode(self), or self.__unicode__, which returns an
EvilString instance, not a variable of type unicode.
And there starts the recursion.

I don't know if this is flaw in the design of _reduce_ex, or a flaw
inherent to having __unicode__(self) returning self.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1757062&group_id=5470


More information about the Python-bugs-list mailing list