[ANN] pyparsing 1.4.5 released

Paul McGuire ptmcg at austin.rr.com
Fri Dec 22 22:59:53 EST 2006


This latest version of pyparsing has a few minor bug-fixes and
enhancements, and a performance improvement of up to 100% increase in
parsing speed.

This release also includes some new examples:
- parsePythonValue.py - parses strings representing lists, dicts,
  and tuples, with nesting support
- sql2dot.py - SQL diagram generator, parsed from schema table
  definitions
- htmlStripper.py - strips HTML tags from HTML pages, leaving only
  body text

Download pyparsing 1.4.5 at http://pyparsing.sourceforge.net.  The
pyparsing Wiki is at http://pyparsing.wikispaces.com

-- Paul

========================================
Pyparsing is a pure-Python class library for quickly developing
recursive-descent parsers.  Parser grammars are assembled directly in
the calling Python code, using classes such as Literal, Word,
OneOrMore, Optional, etc., combined with operators '+', '|', and '^'
for And, MatchFirst, and Or.  No separate code-generation or external
files are required.  Pyparsing comes with a number of parsing examples,
including:
- "Hello, World!" (English, Korean, and Greek)
- chemical formulas
- configuration file parser
- web page URL extractor
- 5-function arithmetic expression parser
- subset of CORBA IDL
- chess portable game notation
- simple SQL parser
- Mozilla calendar file parser
- EBNF parser/compiler
- Python value string parser (lists, dicts, tuples, with nesting) (new)
- HTML tag stripper (new)


Version 1.4.5 - December, 2006
------------------------------
- Removed debugging print statement from QuotedString class.  Sorry
  for not stripping this out before the 1.4.4 release!

- A significant performance improvement, the first one in a while!
  For my Verilog parser, this version of pyparsing is about double the
  speed - YMMV.

- Added support for pickling of ParseResults objects.  (Reported by
  Jeff Poole, thanks Jeff!)

- Fixed minor bug in makeHTMLTags that did not recognize tag attributes
  with embedded '-' or '_' characters.  Also, added support for
  passing expressions to makeHTMLTags and makeXMLTags, and used this
  feature to define the globals anyOpenTag and anyCloseTag.

- Fixed error in alphas8bit, I had omitted the y-with-umlaut character.

- Added punc8bit string to complement alphas8bit - it contains all the
  non-alphabetic, non-blank 8-bit characters.

- Added commonHTMLEntity expression, to match common HTML "ampersand"
  codes, such as "<", ">", "&", " ", and """.  This
  expression also defines a results name 'entity', which can be used
  to extract the entity field (that is, "lt", "gt", etc.).  Also added
  built-in parse action replaceHTMLEntity, which can be attached to
  commonHTMLEntity to translate "<", ">", "&", " ", and
  """ to "<", ">", "&", " ", and "'".

- Added example, htmlStripper.py, that strips HTML tags and scripts
  from HTML pages.  It also translates common HTML entities to their
  respective characters.




More information about the Python-list mailing list