lxml 2.1 beta3 released

Stefan Behnel stefan_ml at behnel.de
Fri Jun 20 10:20:10 CEST 2008


Hi all,

I'm proud to release lxml 2.1beta3 to PyPI. This is the first lxml release
that builds and works on Python 2.3, 2.4, 2.5, 2.6 (beta) and 3.0 (beta).

http://codespeak.net/lxml/dev/
http://pypi.python.org/pypi/lxml/2.1beta3

Install with

    easy_install lxml==2.1beta3

What is lxml?

"""
In short: lxml is the most feature-rich and easy-to-use library for working
with XML and HTML in the Python language.

lxml is a Pythonic binding for the libxml2 and libxslt libraries. It is unique
in that it combines the speed and feature completeness of these libraries with
the simplicity of a native Python API.
"""

Unusual for a beta release, the third beta contains more new features than bug
fixes, which is largely (but not only) due to adaptations with respect to
Python 3. The changelog follows below.

I expect this to be the last beta release before 2.1 final. Feedback is very
much appreciated, especially on the "experimental" features like the namespace
cleanup function and on Python 2.6/3.0 support. Your feedback will help in
making the final release the best lxml ever.

Have fun,
Stefan


2.1beta3 (2008-06-19)
Features added

    * Major overhaul of tools/xpathgrep.py script.
    * Pickling ElementTree objects in lxml.objectify.
    * Support for parsing from file-like objects that return unicode strings.
    * New function etree.cleanup_namespaces(el) that removes unused namespace
      declarations from a (sub)tree (experimental).
    * XSLT results support the buffer protocol in Python 3.
    * Polymorphic functions in lxml.html that accept either a tree or a
      parsable string will return either a UTF-8 encoded byte string, a
      unicode string or a tree, based on the type of the input. Previously,
      the result was always a byte string or a tree.
    * Support for Python 2.6 and 3.0 beta.
    * File name handling now uses a heuristic to convert between byte strings
      (usually filenames) and unicode strings (usually URLs).
    * Parsing from a plain file object frees the GIL under Python 2.x.
    * Running iterparse() on a plain file (or filename) frees the GIL on
      reading under Python 2.x.
    * Conversion functions html_to_xhtml() and xhtml_to_html() in lxml.html
      (experimental).
    * Most features in lxml.html work for XHTML namespaced tag names
      (experimental).

Bugs fixed

    * ElementTree.parse() didn't handle target parser result.
    * Crash in Element class lookup classes when the __init__() method of the
      super class is not called from Python subclasses.
    * A number of problems related to unicode/byte string conversion of
      filenames and error messages were fixed.
    * Building on MacOS-X now passes the "flat_namespace" option to the C
      compiler, which reportedly prevents build quirks and crashes on this
      platform.
    * Windows build was broken.
    * Rare crash when serialising to a file object with certain encodings.

Other changes

    * Non-ASCII characters in attribute values are no longer escaped on
      serialisation.
    * Passing non-ASCII byte strings or invalid unicode strings as .tag,
      namespaces, etc. will result in a ValueError instead of an
      AssertionError (just like the tag well-formedness check).
    * Up to several times faster attribute access (i.e. tree traversal) in
      lxml.objectify.



More information about the Python-announce-list mailing list