Simple Text Processing Help

Sun Oct 14 13:49:38 EDT 2007

On Sun, 14 Oct 2007 16:57:06 +0000, patrick.waldo wrote:

> Thank you both for helping me out.  I am still rather new to Python
> and so I'm probably trying to reinvent the wheel here.
> 
> When I try to do Paul's response, I get
>>>>tokens = line.strip().split()
> []

What is in `line`?  Paul wrote this in the body of the ``for`` loop over
all the lines in the file.

> So I am not quite sure how to read line by line.

That's what the ``for`` loop over a file or file-like object is doing. 
Maybe you should develop your script in smaller steps and do some printing
to see what you get at each step.  For example after opening the input
file:

for line in input:
    print line     # prints the whole line.
    tokens = line.split()
    print tokens   # prints a list with the split line.

> tokens = input.read().split() gets me all the information from the
> file.

Right it reads *all* of the file, not just one line.

>  tokens[2:-1] = [u' '.join(tokens[2:-1])] works just fine, like
> in the example; however, how can I loop this for the entire document?

Don't read the whole file but line by line, just like Paul showed you.

> Also, when I try output.write(tokens), I get "TypeError: coercing to
> Unicode: need string or buffer, list found".

`tokens` is a list but you need to write a unicode string.  So you have to
reassemble the parts with '|' characters in between.  Also shown by Paul.

Ciao,
	Marc 'BlackJack' Rintsch