[pypy-issue] [issue1149] BeautifulSoup aborts parsing on quotation errors

chrysn tracker at bugs.pypy.org
Wed May 23 00:19:51 CEST 2012


New submission from chrysn <chrysn at fsfe.org>:

the BeautifulSoup module behaves differently when run under pypy than when run 
in cpython.

when quotes within quoted attributes (think javascript) occur under cpython, the 
respective attribute gets split up, but the effect is local. with pypy, the 
element and all elements after it are just dropped.

complete examples:

$ python -c 'from bs4 import BeautifulSoup; s = """<div><em>a</em><a 
onClick="a("b")">b</a><em>c</em></div>"""; print BeautifulSoup(s)'
<html><body><div><em>a</em><a b="" onclick="a(">b</a><em>c</em></div></body>
</html>
$ PYTHONPATH='/usr/lib/python2.7/dist-packages/' pypy -c 'from bs4 import 
BeautifulSoup; s = """<div><em>a</em><a onClick="a("b")">b</a><em>c</em>
</div>"""; print BeautifulSoup(s)'
<div><em>a</em></div>

no further investigation into the internals of bs4 were conducted on my part so 
far; reporting this issue to the pypy bug tracker as suggested on irc.

versions and other details:

$ pypy --version
Python 2.7.2 (1.8+dfsg-2, Feb 18 2012, 07:30:46)
[PyPy 1.8.0 with GCC 4.6.2]
$ python --version
Python 2.7.3rc2

beautiful soup version: 4.0.5-1

running on linux 3.1.0-1-amd64, the rest of the system is debian sid. all 
involved software runs native 64bit.

----------
messages: 4326
nosy: chrysn, pypy-issue
priority: bug
status: unread
title: BeautifulSoup aborts parsing on quotation errors

________________________________________
PyPy bug tracker <tracker at bugs.pypy.org>
<https://bugs.pypy.org/issue1149>
________________________________________


More information about the pypy-issue mailing list