[Tutor] Should I use python for parsing text

Alan Gauld alan.gauld at btinternet.com
Sun Mar 11 16:32:35 CET 2007


"Jay Mutter III" <jmutter at uakron.edu> wrote

> See example  next:
> A.-C. Manufacturing Company. (See Sebastian, A. A.,
> and Capes, assignors.)
>...
>Aaron, Solomon E., Boston, Mass. Pliers. No. 1,329,155 ;
>Jan. 27 ; v. 270 ; p. 554.
>
> For instance, I would like to go to end of line and if last
> character  is a comma or semicolon or hyphen then
> remove the CR.

It would look something like:

output = open('example.fixed','w')
for line in file('example.txt'):
    if line[-1] in ',;-':            # check last character
      line = line.strip()         # lose the C/R
      output.write(line)        # write to output
    else: output.write(line)  # append the next line complete with C/R
output.close()

> Then move line by line through the file and delete everything
> after a  numerical sequence

Slightly more tricky because you need to use a regular expression.
But if you know regex then only slightly.

>  I am wondering if Python would be a good tool

Absolutely, its one of the areas where Python excels.

> find information on how to accomplish this

You could check  my tutorial on the three topics:

Handling text
Handling files
Regular Expressions.

Also the standard python documentation for the general tutorial
(assuming you've done basic programming in some other language
before) plus the re module

> using something like the unix tool awk or something else??

awk or sed could both be used, but Python is more generally
useful so unless you already know awk I'd take the time to
learn the basics of Python (a few hours maybe) and use that.

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld




More information about the Tutor mailing list