Ideas for a project? (a MSc project, sort of)
Andrew Dalke
dalke at acm.org
Wed May 2 23:20:58 EDT 2001
Eloy:
> I would like to ask if there is any idea about projects that
> could fit that description. Of course the results would be
> available to everybody.
Well, there's a project I've been working on called Martel,
http://www.biopython.org/~dalke/Martel/ . It lets people
work with many existing flat-file formats as if the data is
already in XML. This is done with a regular expression
based parser generator using mxTextTools to do the actual
parsing. From my limited search of the literature, this
style of parsing is novel, and I think there is more that
can be done with it which would be interesting for a master's
level project.
Some possibities are:
- optimizing the generated mxTextTools state tables
- building very fast parsers which assume the input
stream is in the correct format and skip unrequested fields
- eliminate the need for keeping all the input text in
memory by recognizing when backtracking is not possible
Andrew
dalke at acm.org
More information about the Python-list
mailing list