[Tutor] Should I use python for parsing text
Alan Gauld
alan.gauld at btinternet.com
Sun Mar 11 16:32:35 CET 2007
"Jay Mutter III" <jmutter at uakron.edu> wrote
> See example next:
> A.-C. Manufacturing Company. (See Sebastian, A. A.,
> and Capes, assignors.)
>...
>Aaron, Solomon E., Boston, Mass. Pliers. No. 1,329,155 ;
>Jan. 27 ; v. 270 ; p. 554.
>
> For instance, I would like to go to end of line and if last
> character is a comma or semicolon or hyphen then
> remove the CR.
It would look something like:
output = open('example.fixed','w')
for line in file('example.txt'):
if line[-1] in ',;-': # check last character
line = line.strip() # lose the C/R
output.write(line) # write to output
else: output.write(line) # append the next line complete with C/R
output.close()
> Then move line by line through the file and delete everything
> after a numerical sequence
Slightly more tricky because you need to use a regular expression.
But if you know regex then only slightly.
> I am wondering if Python would be a good tool
Absolutely, its one of the areas where Python excels.
> find information on how to accomplish this
You could check my tutorial on the three topics:
Handling text
Handling files
Regular Expressions.
Also the standard python documentation for the general tutorial
(assuming you've done basic programming in some other language
before) plus the re module
> using something like the unix tool awk or something else??
awk or sed could both be used, but Python is more generally
useful so unless you already know awk I'd take the time to
learn the basics of Python (a few hours maybe) and use that.
--
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld
More information about the Tutor
mailing list