[Python-Dev] Integrate BeautifulSoup into stdlib?

Guido van Rossum guido at python.org
Thu Mar 5 18:32:29 CET 2009


On Thu, Mar 5, 2009 at 2:39 AM, Stefan Behnel <stefan_ml at behnel.de> wrote:
> Ivan Krstić wrote:
>> On Mar 4, 2009, at 12:32 PM, James Y Knight wrote:
>>> I think html5lib would be a better candidate for an imrpoved HTML
>>> parser in the stdlib than BeautifulSoup.
>>
>> While we're talking about alternatives, Ian Bicking appears to swear by
>> lxml:
>>
>> <http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/>
>
> I second that. ;)
>
> And, BTW, I wouldn't mind getting lxml into the stdlib either.

No matter how beautiful and fast lxml is, it has one downside where it
comes to installing it into the stdlib: it is based on large, complex
3rd party libraries, libxml2 and libxslt.

Based on the sad example of BerkeleyDB, which was initially welcomed
into the stdlib but more recently booted out for reasons having to do
with the release cycle of the external dependency and other issues
typical for large external dependencies, I think we should be very
careful with including it in the standard library.

Instead, let's hope Linux distros pick it up (and if anyone knows how
to encourage that, let us know).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list