[Tutor] Picking up citations

Dinesh B Vadhia dineshbvadhia at hotmail.com
Sun Feb 8 00:15:47 CET 2009


Kent

I've just thought that as an initial attempt, the last name (of the name before the v.) is sufficient ie. "Turner v. Fouche, 396 U.S. 346 (1970)" instead of "Lathe Turner v. Fouche, 396 U.S. 346 (1970)" as we are only using the citations internally and not displaying publicly.  That solves the first name problem.

The remaining problem is picking up multiple pages in a citation ie.

"John Doggone Williams v. Florida, 399 U.S. 78, 90 S.Ct. 1893, 234, 26 L.Ed.2d 446 (1970)"

... and a variation of this is:

"John Doe Agency v. John Doe Corp., 493 U.S. 146, 159-60 (1934)"

I didn't know about pyparsing which appears to be very powerful and have joined their list.  Thank-you for your help.

Dinesh




From: Kent Johnson 
Sent: Saturday, February 07, 2009 1:19 PM
To: Dinesh B Vadhia 
Cc: tutor at python.org 
Subject: Re: [Tutor] Picking up citations


It turns out you can use Or expressions to cause a kind of
backtracking in Pyparsing. This is very close to what you want:

Name1 = Forward()
Name1 << Combine(Word(alphas) + Name1 | Word(alphas) + Suppress('v.'),
joinString=' ', adjacent=False).setResultsName('name1')
Name2 = Combine(OneOrMore(Word(alphas)), joinString=' ',
adjacent=False).setResultsName('name2')

Volume = Word(nums).setResultsName('volume')
Reporter = Word(alphas, alphanums+".").setResultsName('reporter')
Page = Word(nums).setResultsName('page')
Page2 = (',' + Word(nums)).setResultsName('page2')

VolumeCitation = (Volume + Reporter +
Page).setResultsName('volume_citation', listAllMatches=True)
VolumeCitations = Forward()
VolumeCitations << (
      Combine(VolumeCitation  + Page2, joinString=' ',
adjacent=False).setResultsName('volume_citation2')
        + Suppress(',') + VolumeCitations
    | VolumeCitation + Suppress(',') + VolumeCitations
    | Combine(VolumeCitation  + Page2, joinString=' ',
adjacent=False).setResultsName('volume_citation2')
    | VolumeCitation
)

Date = (Suppress('(') +
Combine(CharsNotIn(')')).setResultsName('date') + Suppress(')'))

FullCitation = Name1 + Name2 + Suppress(',') + VolumeCitations + Date

for item in FullCitation.scanString(text):
    fc = item[0]
    # Uncomment the following to see the raw parse results
    # pp(fc)
    # print
    # print fc.name1
    # print fc.name2
    # for vc in fc.volume_citation:
    #     pp(vc)

    # If name1 is multiple words it is enclosed in a ParseResults
    name1 = fc.name1
    if isinstance(name1, ParseResults):
        name1 = name1[0]

    for vc in fc.volume_citation:
        print '%s v. %s, %s %s %s (%s)' % (name1, fc.name2, vc.volume,
vc.reporter, vc.page, fc.date)

    for vc2 in fc.volume_citation2:
        print '%s v. %s, %s (%s)' % (name1, fc.name2, vc2, fc.date)
    print


Output:

Carter v. Jury Commission of Greene County, 396 U.S. 320 (1970)
Carter v. Jury Commission of Greene County, 90 S.Ct. 518 (1970)
Carter v. Jury Commission of Greene County, 24 L.Ed.2d 549 (1970)

Lathe Turner v. Fouche, 396 U.S. 346 (1970)
Lathe Turner v. Fouche, 90 S.Ct. 532 (1970)
Lathe Turner v. Fouche, 24 L.Ed.2d 567 (1970)

White v. Crook, 251 F.Supp. 401 (DCMD Ala.1966)

In John Doggone Williams v. Florida, 399 U.S. 78 (1970)
In John Doggone Williams v. Florida, 26 L.Ed.2d 446 (1970)
In John Doggone Williams v. Florida, 90 S.Ct. 1893 , 234 (1970)


It is correct except for the inclusion of "In" in the name and the
extra space before the comma separating the page numbers in the last
citation.

Don't ask me why I did this :-)
Kent
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090207/4c6ce581/attachment-0001.htm>


More information about the Tutor mailing list