[pypy-issue] [issue1149] BeautifulSoup aborts parsing on quotation errors
chrysn
tracker at bugs.pypy.org
Wed May 23 00:19:51 CEST 2012
New submission from chrysn <chrysn at fsfe.org>:
the BeautifulSoup module behaves differently when run under pypy than when run
in cpython.
when quotes within quoted attributes (think javascript) occur under cpython, the
respective attribute gets split up, but the effect is local. with pypy, the
element and all elements after it are just dropped.
complete examples:
$ python -c 'from bs4 import BeautifulSoup; s = """<div><em>a</em><a
onClick="a("b")">b</a><em>c</em></div>"""; print BeautifulSoup(s)'
<html><body><div><em>a</em><a b="" onclick="a(">b</a><em>c</em></div></body>
</html>
$ PYTHONPATH='/usr/lib/python2.7/dist-packages/' pypy -c 'from bs4 import
BeautifulSoup; s = """<div><em>a</em><a onClick="a("b")">b</a><em>c</em>
</div>"""; print BeautifulSoup(s)'
<div><em>a</em></div>
no further investigation into the internals of bs4 were conducted on my part so
far; reporting this issue to the pypy bug tracker as suggested on irc.
versions and other details:
$ pypy --version
Python 2.7.2 (1.8+dfsg-2, Feb 18 2012, 07:30:46)
[PyPy 1.8.0 with GCC 4.6.2]
$ python --version
Python 2.7.3rc2
beautiful soup version: 4.0.5-1
running on linux 3.1.0-1-amd64, the rest of the system is debian sid. all
involved software runs native 64bit.
----------
messages: 4326
nosy: chrysn, pypy-issue
priority: bug
status: unread
title: BeautifulSoup aborts parsing on quotation errors
________________________________________
PyPy bug tracker <tracker at bugs.pypy.org>
<https://bugs.pypy.org/issue1149>
________________________________________
More information about the pypy-issue
mailing list