re vs. sgmllib (was: Moving from Perl to Python)

Jon Fernquest ferni at loxinfo.co.th
Sun Sep 26 10:12:42 EDT 1999


>>Regexps are not Python's forte, and all of speed, clarity and
>>maintainability will increase in proportion to your zeal in purging them.
>
>That sounds like a Call To Action for rewriting sgmllib to use something
>other than re.  Has anyone started any work in this area?

Since regular expressions are just a short-hand way of specifying
(practically only *some*) regular languages whereas finite state
machines can specify any regular language the next logical step
would be a set of finite state tools like those that Xerox sells
(for several thousands of dollars I might add).
Perl's adoption of regular expressions was sort of a revolution I guess,
but there's another revolution looming on the horizon for the language that
incorporates generalized finite state technology.

Finite state technology is really great for dealing with non-roman character
sets,
(example: word wrap on HTML pages in a language that doesn't put spaces
between words.)
an issue that will become increasingly important as the whole comes online
and particularly important because Microsoft has its own proprietary
approach to the languages of the world as is very apparent in its recent
release
of Office 2000 for Indian languages. According to CNN you still can't type
Indian scripts directly into web pages. You have to type it into an
application
and cut and paste. The GNU Mule project's goal is multilingual software
and is getting a lot of support among Thai computer scientists who
want to break the microsoft stranglehold.

See the great paper:
Regular Expressions For Language Engineering.
http://www.xrce.xerox.com/publis/mltt/mltt-96-15.ps
http://www.xrce.xerox.com/publis/mltt/mlttart.html
http://www.xrce.xerox.com/research/mltt/fst/

Also, a graph editor is a much more readable (visual programming) way
of specifying a regular language (via a finite state machine)
than a regular expression is:
http://www.ladl.jussieu.fr/Intex/new_intex.html
http://www.ladl.jussieu.fr/Intex/index.html
http://www.fmi.uni-passau.de/Graphlet/

The little language Gema also has language "acceptor" objects and also some
recursive pattern matching capability which can be used to parse.
http://www.telerama.com/~mundie/index.html

Cheers,

Jon Fernquest
ferni at loxinfo.co.th
bayinnaung at hotmail.com
http://www.geocities.com/SoHo/Square/3472/nounphrase.html















More information about the Python-list mailing list