[ python-Bugs-914148 ] xml.sax segfault on error

SourceForge.net noreply at sourceforge.net
Mon Mar 15 05:26:51 EST 2004


Bugs item #914148, was opened at 2004-03-11 06:14
Message generated for change (Comment added) made by moraes
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=914148&group_id=5470

Category: XML
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Adam Sampson (adamsampson)
Assigned to: Nobody/Anonymous (nobody)
Summary: xml.sax segfault on error

Initial Comment:
While (mistakenly) using Mark Pilgrim's feedparser
module to parse data from
<http://www.gothamist.com/archives/news_nyc/index.php>,
Python segfaults when it should invoke an error handler
for invalid XML. The attached code demonstrates the
problem; it occurs with Python 2.2.3 and 2.3.3 on my
system. I&#039;ve tried to chop the example data down as far
as possible, but reducing it any further doesn&#039;t
exhibit the problem (it&#039;s currently just above 64k,
which might be a coincidence).

The gdb traceback I get from the example is as follows:

#0  normal_updatePosition (enc=0x404a4fc0, 
    ptr=0x40682000 <Address 0x40682000 out of bounds>, 
    end=0x81e87e0 "a></div>\n\n<div
id=\content\>\n\n<div class=\blog\>\n<!--\n<rdf:RDF
xmlns:rdf=\http://www.w3.org/1999/02/22-rdf-syntax-ns#\\n
       
xmlns:trackback=\http://madskills.com/public/xml/rss/module/trackback/\\n"...,
pos=0x81e7dac)
    at
/120g/gar/python/python23/work/Python-2.3.3/Modules/expat/xmltok_impl.c:1745
#1  0x40484288 in XML_GetCurrentLineNumber
(parser=0x81e7c18)
    at
/120g/gar/python/python23/work/Python-2.3.3/Modules/expat/xmlparse.c:1605
#2  0x40481fc5 in set_error (self=0x0,
code=XML_ERROR_TAG_MISMATCH)
    at
/120g/gar/python/python23/work/Python-2.3.3/Modules/pyexpat.c:124
#3  0x40480ae7 in xmlparse_Parse (self=0x402fddac,
args=0x0)
    at
/120g/gar/python/python23/work/Python-2.3.3/Modules/pyexpat.c:888
#4  0x080fc25a in PyCFunction_Call (func=0x402faa0c,
arg=0x402f338c, 
    kw=0xfffffffb) at Objects/methodobject.c:108
#5  0x080aa674 in call_function (pp_stack=0xbffff03c,
oparg=0)
    at Python/ceval.c:3439
#6  0x080a8a2e in eval_frame (f=0x816e45c) at
Python/ceval.c:2116
#7  0x080a95bc in PyEval_EvalCodeEx (co=0x40303de0,
globals=0xfffffffb, 
    locals=0x0, args=0x816e5a8, argcount=2,
kws=0x816a9fc, kwcount=0, 
    defs=0x40321678, defcount=1, closure=0x0) at
Python/ceval.c:2663
#8  0x080aa729 in fast_function (func=0xfffffffb,
pp_stack=0xbffff1bc, n=2, 
    na=0, nk=135703028) at Python/ceval.c:3529
#9  0x080aa56c in call_function (pp_stack=0xbffff1bc,
oparg=0)
    at Python/ceval.c:3458
#10 0x080a8a2e in eval_frame (f=0x816a894) at
Python/ceval.c:2116
#11 0x080a95bc in PyEval_EvalCodeEx (co=0x402fd2a0,
globals=0xfffffffb, 
    locals=0x0, args=0x402f3318, argcount=2, kws=0x0,
kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at Python/ceval.c:2663
#12 0x080fbda7 in function_call (func=0x4030617c,
arg=0x402f330c, kw=0x0)
    at Objects/funcobject.c:504
#13 0x0805b899 in PyObject_Call (func=0x40682000,
arg=0x0, kw=0x0)
    at Objects/abstract.c:1755
#14 0x08062288 in instancemethod_call (func=0x4030617c,
arg=0x402f330c, kw=0x0)
    at Objects/classobject.c:2433
#15 0x0805b899 in PyObject_Call (func=0x40682000,
arg=0x0, kw=0x0)
    at Objects/abstract.c:1755
#16 0x080aa892 in do_call (func=0x4032025c,
pp_stack=0x402f330c, na=0, nk=0)
    at Python/ceval.c:3644
#17 0x080aa4f9 in call_function (pp_stack=0xbffff5fc,
oparg=0)
    at Python/ceval.c:3460
#18 0x080a8a2e in eval_frame (f=0x818b414) at
Python/ceval.c:2116
#19 0x080aa7ad in fast_function (func=0xfffffffb,
pp_stack=0xbffff71c, n=2, 
    na=0, nk=1076865996) at Python/ceval.c:3518
#20 0x080aa56c in call_function (pp_stack=0xbffff71c,
oparg=0)
    at Python/ceval.c:3458
#21 0x080a8a2e in eval_frame (f=0x8183814) at
Python/ceval.c:2116
#22 0x080a95bc in PyEval_EvalCodeEx (co=0x402ed2a0,
globals=0xfffffffb, 
    locals=0x0, args=0x0, argcount=0, kws=0x0,
kwcount=0, defs=0x0, 
    defcount=0, closure=0x0) at Python/ceval.c:2663
#23 0x080abdb9 in PyEval_EvalCode (co=0x0, globals=0x0,
locals=0x0)
    at Python/ceval.c:537
#24 0x080d7d2b in run_node (n=0x402bb79c, filename=0x0,
globals=0x0, 
    locals=0x0, flags=0x0) at Python/pythonrun.c:1265
#25 0x080d74df in PyRun_SimpleFileExFlags (fp=0x8139050, 
    filename=0xbffffa4d "testexpat.py",
closeit=-1073743283, flags=0xbffff878)
    at Python/pythonrun.c:862
#26 0x08054dd5 in Py_Main (argc=1, argv=0xbffff8f4) at
Modules/main.c:415
#27 0x0805492b in main (argc=0, argv=0x0) at
Modules/python.c:23


----------------------------------------------------------------------

Comment By: Mark Moraes (moraes)
Date: 2004-03-15 02:26

Message:
Logged In: YES 
user_id=390363

#! /usr/bin/env python

dhead = """<?xml version="1.0" encoding="ISO-8859-1" ?>
<item><title>&#187</title></item>
<item><title>
"""
dtail = """</title></item>
"""

import xml.sax
from cStringIO import StringIO as _StringIO

class _StrictFeedParser:
    def _err(self, errtype, exc):
        print errtype, exc.getMessage(),             'line', exc.getLineNumber(),             'column', exc.getColumnNumber()
    def fatalError(self, exc):
        self._err('fatalError', exc)
        # raise exc # avoids the problem
    def error(self, exc):
        self._err('error', exc)
    def warning(self, exc):
        self._err('warning', exc)

def parse(data):
    feedparser = _StrictFeedParser()
    saxparser = xml.sax.make_parser(["drv_libxml2"])
    saxparser.setErrorHandler(feedparser)
    source = xml.sax.xmlreader.InputSource()
    source.setByteStream(_StringIO(data))
    saxparser.parse(source)

if __name__ == '__main__':
    for i in xrange(65427,66000,1):
        print i
        parse(dhead + 'x'*i + dtail)

----------------------------------------------------------------------

Comment By: Mark Moraes (moraes)
Date: 2004-03-15 02:22

Message:
Logged In: YES 
user_id=390363

I ran into this as well -- turns out that 64k is relevant: I
have a simpler script that reproduces this problem -- create
an unterminated character ref such as "&#171" without the
trailing semi-colon and add roughly 64k of data after it. 
The crash occurs if the sax parser has an ErrorHandler set
where the fatalError() method returns normally instead of
terminating/raising the exception.

As a defensive measure, I suggest that any call to the
fatalError method be followed by a raise of the exception if
fatalError returns.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=914148&group_id=5470



More information about the Python-bugs-list mailing list