[Tutor] Should I use python for parsing text

Jay Mutter III jmutter at uakron.edu
Sat Mar 10 17:10:30 CET 2007


I am using an intel iMac with OS -X 10.4.8.
It has Python 2.3.5.

My issue is that I have a lot of text ( about 500 pages at the  
moment) that I need to parse so that I can eliminate  info I don't  
need, break the remainder into fields and put in a database/spreadsheet.
See example  next:

A.-C. Manufacturing Company. (See Sebastian, A. A.,
and Capes, assignors.)
A. G. A. Railway Light & Signal Co. (See Meden, Elof
H„ assignor.)
A-N Company, The. (See Alexander and Nasb, as-
signors.;
AN Company, The. (See Nash, It. J., and Alexander, as-
signors.)
A/S. Arendal Smelteverk.    (See Kaaten, Einar, assignor.)
A/S. Bjorgums Gevaei'kompani. (See Bjorguni, Nils, as-
signor.)
A/S  Mekano.     (Sec   Schepeler,   Herman  A.,  assignor.)
A/S Myrens Verkstad.    (See Klling, Jens W. A., assignor.)
A/S Stordo Kisgruber. (See Nielsen, C., and Ilelleland,
assignors.)
A-Z Company, The.    'See llanmer, Laurence G., assignor.)
Aagaard, Carl L., Rockford, 111. Hand scraping tool. No.
1,345,058 ; July 6; v. 276 ; p. 05.
Aalborg, Christian, Wllkinsburg, Pa., assignor to Wcst-
inghouse Electric and Manufacturing Company. Trol-
ley.    No. 1,334,943 ; Mar. 30 ; v. 272 ; p. 741.
Aaron, Solomon E., Boston, Mass. Pliers. No. 1,329,155 ;
Jan. 27 ; v. 270 ; p. 554.

For instance, I would like to go to end of line and if last character  
is a comma or semicolon or hyphen then remove the CR.
Then move line by line through the file and delete everything after a  
numerical sequence

  I am wondering if Python would be a good tool and if so where can I  
find information on how to accomplish this or would I be better off  
using something like the unix tool awk or something else??

Thanks

Jay


More information about the Tutor mailing list