question about nasty regex

Peter facetious_nickname at hotmail.com
Mon Apr 3 11:59:00 EDT 2006


I'm wondering if someone can tell me whether the following set of 
regex substitutions is possible. I want to convert parallel legal 
citations into single citations. What I mean is, I want to change, e.g.:

"Doremus v. Board of Education of Hawthorne, 342 U.S. 429, 434, 72 
S. Ct. 394, 397, 96 L.Ed. 475 (1952)."

into:

"Doremus v. Board of Education of Hawthorne, 342 U.S. 429, 434 (1952)."

Generally, the beginning pattern would consist of:

1. Two names, consisting of one or more words, always separated by a 
"v."

2. One, two, or three citations, each of which always has a volume 
number ("342") followed by a name, consisting of one or two word 
units always ending with "." ("U.S."), followed by a page number ("429")

3. Each citation may contain a comma and a second page number (", 434")

4. Optionally, a parenthesized year ("(1952)")

5. A final "."

I am thinking this is impossible, but I thought that if it were 
possible to translate this into Python code, someone here could put 
me on the right track.

Thanks.



More information about the Python-list mailing list