[Tutor] removing line ends from Word text files

Lloyd Kvam pythonTutor at venix.com
Sat Jul 17 21:25:08 CEST 2004


On Sat, 2004-07-17 at 12:54, David Rock wrote:
> * Michael Janssen <Janssen at rz.uni-frankfurt.de> [2004-07-17 15:55]:
> > On Fri, 9 Jul 2004, Christian Meesters wrote:
> > 
> > > Right now I have the problem that I want to remove the MS Word line end
> > > token from text files: When saving a text file as 'text only' line ends
> > > are displayed as '^M' in a shell (SGI IRIX (tcsh) and Mac (tcsh or
> > > bash)). I want to get rid of these elements for further processing of
> > > the file and have no idea how to access them in a Python script. Any
> > > idea how to replace the '^M' against a simple '\n'? (I already tried
> > > '\r\n' and various other combinations of characters, but apparently all
> > > aren't '^M'.) '^M' is one character.
> > 
> > You can allways ask Python when you want to know how it will represent
> > this character: Read one line with "readline" and print its repr-string:
> > 
> > fo = open("filename")
> > line = fo.readline()
> > print repr(line)
> > 
> > repr gives you an alternative string representation of any objects. repr
> > used on strings doesn't interpret backslash sequences like \n or \r. As
> > you are on MAC, I would guess your newline character is a simple "\r".
> > 
> > you can also ask Python for the caracter's ordinal
> > print ord(line[-2]) # just in case one newline consists of two chars
> > print ord(line[-1])
> > 
> > It's probably best to do such investigations with an interactive Python
> > session. But now since I've realized that readline is Unix-only, I don't
> > think interactive mode is that much fun on MAC/Win: without readline you
> > can't repeat your commands (without having to type them again and again).
> > You can't use the cursor keys. Perhaps Idle offers elaborate line editing
> > even on those systems.
> 
> OK, a couple things... 
> readline is NOT a Unix-only thing. I just tried it on my XP box and it's
> fine. open is also an older way of doing things with opening files, as
> of 2.2, file is probably what you want.

I too was shifting from open(...) to file(...), however, Guido is
recommending a change to the documentation and continued use of open.
http://mail.python.org/pipermail/python-dev/2004-July/045931.html


> 
> http://www.python.org/doc/current/lib/built-in-funcs.html#l2h-25
> 
> and for the sake of completeness, here is the info about built-in file
> objects:
> http://www.python.org/doc/current/lib/bltin-file-objects.html
> 
> So this:
> fo = open("filename")
> line = fo.readline()
> print repr(line)
> 
> becomes this:
> fo = file("filename")
> line = fo.readline()
> print repr(line)
> 
> as for interactive Python, I have recently been introduced to ipython
> and it's great. It has a LOT of features that aren't in the normal
> shell:
> http://ipython.scipy.org/
> 
> And finally, ^M is decimal 13 (hex 0D), \n is 10, and \r is 13 ...
> hmm, I guess that means ^M == \r
> 
> One thing that I have used over the years to strip newline chars off
> lines is this, it's not the prettiest, but you'll get the idea:
> 
> 	if '\n' in line:
> 		line = line[:-1]
> 	if '\r' in line:
> 		line = line[:-1]
I think
	for c in "\r\n":
		if line.endswith(c):
> 
> basically, it's assuming (in the case of Windows) that the file ends
> with '\r\n', and strips them off one at a time.
-- 

Lloyd Kvam
Venix Corp.
1 Court Street, Suite 378
Lebanon, NH 03766-1358

voice:	603-653-8139
fax:	801-459-9582



More information about the Tutor mailing list