[Tutor] string parsing

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Wed Jan 29 12:49:01 2003


On Wed, 29 Jan 2003, Justin Ezequiel wrote:

> I guess this is not the right place for this question but I
> do not know where else to ask.
>
> You've all been very helpful.
>
> Can anyone point me to a more appropriate forum?
>
> I need ideas on how to (begin to) programmatically identify
> elements of a bibliographic reference.
>
> By elements, I mean authors/editors, their names (givenname, surname,
> suffix), article/chapter titles, book/journal titles, date of
> publication, volume number, start and end page numbers, publisher name
> and publisher location, etc.

Hi Justin,

There is a project called the Freely Extensible Biomedical Record Linkage
(FEBRL) which uses Hidden Markov Models to parse through address records:

    http://sourceforge.net/projects/febrl
    http://datamining.anu.edu.au/software/febrl/febrldoc/manual.html

It uses some sophisticated models to match text to fields in a record. I
can imagine that this software can be applied toward bibliographic entries
as well!  I haven't played with Febrl yet, but I'm planning to when I have
more time.


I hope this helps!