[Tutor] Picking up citations

Kent Johnson kent37 at tds.net
Mon Feb 9 04:07:22 CET 2009


On Sun, Feb 8, 2009 at 5:53 PM, Emmanuel Ruellan
<emmanuel.ruellan at gmail.com> wrote:
> Dinesh B Vadhia <dineshbvadhia at hotmail.com> wrote:
>> Hi!  I want to process text that contains citations, in this case in legal
>> documents, and pull-out each individual citation.
>
>
> Here is my stab at it, using regular expressions. Any comments welcome.

It's a lot shorter than my parser versions, but it doesn't handle
multiple page numbers correctly (the second Doggone Williams
citation). You could probably handle this by processing the list of
split references.

> I had to use two regexes, one to find all citations, and the other one to
> split-up citations into their components. They are basically the same, the
> former without grouping, and the latter with named groups.

Why not use the grouped regex alone? If you use finditer() instead of
findall() you get a sequence of match objects so you can use the named
groups.

Kent


More information about the Tutor mailing list