Ideas for a project? (a MSc project, sort of)

Andrew Dalke dalke at acm.org
Wed May 2 23:20:58 EDT 2001


Eloy:
> I would like to ask if there is any idea about projects that
> could fit that description. Of course the results would be
> available to everybody.

Well, there's a project I've been working on called Martel,
http://www.biopython.org/~dalke/Martel/ .  It lets people
work with many existing flat-file formats as if the data is
already in XML.  This is done with a regular expression
based parser generator using mxTextTools to do the actual
parsing.  From my limited search of the literature, this
style of parsing is novel, and I think there is more that
can be done with it which would be interesting for a master's
level project.

Some possibities are:
  - optimizing the generated mxTextTools state tables
  - building very fast parsers which assume the input
     stream is in the correct format and skip unrequested fields
  - eliminate the need for keeping all the input text in
     memory by recognizing when backtracking is not possible

                    Andrew
                    dalke at acm.org






More information about the Python-list mailing list