[Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?

ssteinerX@gmail.com ssteinerx at gmail.com
Tue Nov 3 14:40:48 CET 2009


On Nov 3, 2009, at 12:06 AM, Guido van Rossum wrote:

> On Mon, Nov 2, 2009 at 9:51 PM, ssteinerX at gmail.com <ssteinerx at gmail.com 
> > wrote:
>> BeautifulSoup, which I use every day, is one such product.  Since  
>> the crappy
>> old SMGL parser's gone, BeautifulSoup uses the one that's left in  
>> Python 3
>> and it makes BeautifulSoup completely useless for my daily work.
>
> This sounds an area where some help might be useful. Perhaps the
> quickest solution would simply be to copy the old crappy "sgml" based
> html parser into a new version of BeautifulSoup.

That is what we're discussing doing on the old-soup branch at http://github.com/adevore/old-beautiful-soup 
  .  I'm not exactly sure why the old SGML parser was dropped but it  
seems that porting it to Python 3 would be enough of an effort that it  
caused the Python library to drop it, and the current developer of the  
mainline of Beautiful Soup to decide to just use what was available in  
Python 3 natively.

> Though I imagine what it really needs is a "quirks mode" parser that  
> is compatible with the
> HTML dialect accepted by, say, IE6. Maybe a summer of code project?

I think it just relies on the old SGML parser's not blowing up on  
completely bogus HTML (like most of the web) and does the best it can  
with the 'chunks' that come back; nothing to do with quirks mode per se.

As for a Summer of Code project, I have no idea what would be  
involved.  I know there are lots of users for Beautiful soup; as far  
as I know it is the best scraper of HTML code, valid or not, that's  
out there and it's been around a long time and I see it in projects in  
the "html scraping" realm all the time.

At any rate, it's just one example of where the developer has taken  
the easy route out with a 3.0 port and has produced a product that's  
"Python 3" but, instead of getting better with Python's new features,  
has actually become useless for the majority of use-cases where  
formerly it shined.

S



More information about the Python-Dev mailing list