[Expat-bugs] [ expat-Bugs-2855609 ] dangling positionPtr after error

SourceForge.net noreply at sourceforge.net
Thu Sep 10 01:09:04 CEST 2009


Bugs item #2855609, was opened at 2009-09-09 19:09
Message generated for change (Tracker Item Submitted) made by dvitek
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=2855609&group_id=10127

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Dave Vitek (dvitek)
Assigned to: Nobody/Anonymous (nobody)
Summary: dangling positionPtr after error

Initial Comment:
I am getting a segv when asking pyexpat to parse some bad xml.  I'm not sure whether this is a problem with pyexpat or with expat, but I'm leaning towards expat after finding a (partial?) fix.

The stack of the segv is:
(gdb) bt
#0  0x010245f8 in normal_updatePosition (enc=0x1035fc0, ptr=0x73b000 <Address 0x73b000 out of bounds>, end=0x186cc50 "</<die", 'x' <repeats 76 times>, "</<die", 'x' <repeats 76 times>, "</<die", 'x' \
<repeats 30 times>..., pos=0x1317668) at /Volumes/extra/dvitek/trunk/third-party/python/Modules/expat/xmltok_impl.c:1748
#1  0x01011545 in XML_GetCurrentLineNumber (parser=0x13174d0) at /Volumes/extra/dvitek/trunk/third-party/python/Modules/expat/xmlparse.c:1793
#2  0x010093ca in set_error (self=0x725540, code=XML_ERROR_INVALID_TOKEN) at /Volumes/extra/dvitek/trunk/third-party/python/Modules/pyexpat.c:126
#3  0x0100c633 in get_parse_result (self=0x725540, rv=0) at /Volumes/extra/dvitek/trunk/third-party/python/Modules/pyexpat.c:918
#4  0x0100c6e0 in xmlparse_Parse (self=0x725540, args=0x7701e8) at /Volumes/extra/dvitek/trunk/third-party/python/Modules/pyexpat.c:940
#5  0x00068067 in PyCFunction_Call (func=0x770a30, arg=0x7701e8, kw=0x0) at Objects/methodobject.c:81
...

What appears to be going on is something like this:

There is an existing parser object whose m_positionPtr and m_eventPtr properly point somewhere inside m_buffer.  Everything is fine at this point.

Someone calls XML_Parse on this parser.  It proceeds to call XML_GetBuffer.  XML_GetBuffer allocates a new buffer, but doesn't touch m_positionPtr or m_eventPtr.  So these two pointers now point to freed memory (the old buffer).

After XML_GetBuffer returns, there is a call to XML_ParseBuffer.  This function has a failure case just before it sets m_positionPtr and m_eventPtr to point somewhere better.  It returns XML_STATUS_ERROR before touching m_positionPtr.

pyexpat sees this error code and proceeds to call XML_GetErrorLineNumber.  XML_GetLineNumber dereferences m_positionPtr in the stack we see above, causing the crash.

I'm using the pyexpat (part of Python 2.6) interface to expat.  Python 2.6 embeds a copy of expat's lib subdirectory at Modules/expat.  To confirm that things were not better in 2.0.1 I just copied the 2.0.1 sources over the old ones.  I was pleasantly surprised when it built 2.0.1 fine without any tinkering.  Anyway, the problem still happened with 2.0.1.

My fix was to add these three lines near the end of XML_GetBuffer, under the assumption that nothing good can come of the dangling pointers:

     }
+     positionPtr = NULL;
+     eventPtr = NULL;
+     eventEndPtr = NULL;
   }
   return bufferEnd;
 }


My test case was:

from xml.parsers.expat import ParserCreate

for abc in range(1,1000):
    print abc
    p = ParserCreate()
    for i in xrange(100):
        try:
            p.Parse('<die%s</' % ('x' * abc), False)
        except:
            pass


It only caused a crash on MacOSX; it worked OK on Linux.  That's probably just because glibc's free isn't giving back pages to the OS as aggressively as the BSD one though.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=110127&aid=2855609&group_id=10127


More information about the Expat-bugs mailing list