[Tutor] Parsing Bible verses

Eduardo Vieira eduardo.susan at gmail.com
Fri May 22 04:49:22 CEST 2009


On Thu, May 21, 2009 at 7:03 PM, John Fouhy <john at fouhy.net> wrote:
> 2009/5/22 Eduardo Vieira <eduardo.susan at gmail.com>:
>> I will be looking for lines like these:
>> Lesson Text: Acts 5:15-20, 25; 10:12; John 3:16; Psalm 23
>>
>> So, references in different chapters are separated by a semicolon. My
>> main challenge would be make the program guess that 10:12 refers to
>> the previous book. 15-20 means verses 15 thru 20 inclusive. I'm afraid
>> that will take more than Regex and I never studied anything about
>> parser tools, really.
>
> Well, pyparsing is one of the standard python parsing modules.  It's
> not that bad, really :-)
>
> Here's some code I knocked out:
>
> from pyparsing import *
>
> SingleVerse = Word(nums)
> VerseRange = SingleVerse + '-' + SingleVerse
> Verse = VerseRange | SingleVerse
> Verse = Verse.setResultsName('Verse').setName('Verse')
> Verses = Verse + ZeroOrMore(Suppress(',') + Verse)
> Verses = Verses.setResultsName('Verses').setName('Verses')
>
> ChapterNum = Word(nums)
> ChapterNum = ChapterNum.setResultsName('Chapter').setName('Chapter')
> ChapVerses = ChapterNum + ':' + Verses
> SingleChapter = Group(ChapVerses | ChapterNum)
>
> Chapters = SingleChapter + ZeroOrMore(Suppress(';') + SingleChapter)
> Chapters = Chapters.setResultsName('Chapters').setName('Chapters')
>
> BookName = CaselessLiteral('Acts') | CaselessLiteral('Psalm') |
> CaselessLiteral('John')
> BookName = BookName.setResultsName('Book').setName('Book')
>
> Book = Group(BookName + Chapters)
> Books = Book + ZeroOrMore(Suppress(';') + Book)
> Books = Books.setResultsName('Books').setName('Books')
>
> All = CaselessLiteral('Lesson Text:') + Books + LineEnd()
>
> s = 'Lesson Text: Acts 5:15-20, 25; 10:12; John 3:16; Psalm 23'
> res = All.parseString(s)
>
> for b in res.Books:
>    for c in b.Chapters:
>        if c.Verses:
>            for v in c.Verses:
>                print 'Book', b[0], 'Chapter', c[0], 'Verse', v
>        else:
>            print 'Book', b[0], 'Chapter', c[0]
>
> ######
>
> Hopefully you can get the idea of most of it from looking at the code.
>
> Suppress() means "parse this token, but don't include it in the results".
>
> Group() is necessary for getting access to a list of things -- you can
> experiment by taking it out and seeing what you get.
>
> Obviously you'll need to add more names to the BookName element.
>
> Obviously also, there is a bit more work to be done on Verses.  You
> might want to look into the concept of "parse actions".  A really
> simple parse action might be this:
>
> def convertToNumber(string_, location, tokens):
>    """ Used in setParseAction to make numeric parsers return numbers. """
>
>    return [int(tokens[0])]
>
> SingleVerse.setParseAction(convertToNumber)
> ChapterNum.setParseAction(convertToNumber)
>
> That should get you python integers instead of strings.  You can
> probably do more with parseActions to, for instance, turn something
> like '15-20' into [15,16,17,18,19,20].
>
> HTH!
>
> --
> John.
>
Thanks for the thorough example, I guess I really should get into this
thing of parsing somehow.
To W W. I guess that approach can work too. I will study both things
and if I get stumped, I'll try the list again. It will take a while
for me to really delve into the task, but I want to do it for a good
friend of mine.

Eduardo


More information about the Tutor mailing list