[Tutor] why is unicode converted file double spaced?

Tue Apr 7 19:26:36 CEST 2009

Thanks Marc,

But I think that got rid of all of my carriage returns. Everything on
just one line now.

Matthew Pirritano, Ph.D.

Research Analyst IV

Medical Services Initiative (MSI)

Orange County Health Care Agency

(714) 568-5648

________________________________

From: tutor-bounces+mpirritano=ochca.com at python.org
[mailto:tutor-bounces+mpirritano=ochca.com at python.org] On Behalf Of Marc
Tompkins
Sent: Tuesday, April 07, 2009 10:12 AM
To: tutor at python.org
Subject: Re: [Tutor] why is unicode converted file double spaced?

On Tue, Apr 7, 2009 at 9:52 AM, Pirritano, Matthew
<MPirritano at ochca.com> wrote:

So Kent's syntax worked to convert my Unicode file to plain text. But
now my data is double space. How can I fix this.  Here is the code I'm
using.

Sounds like you're being stung by the difference in newline handling
between operating systems - to recap, MS-DOS and Windows terminate a
line with a carriage return and linefeed (aka CRLF or '\r\n'); *nixes
use just LF ('\n'); Mac OS up to version 9 uses just CR ('\r').  You
will have noticed this, on Windows, if you ever open a text file in
Notepad that was created on a different OS - instead of breaking into
separate lines, everything appears on one long line with funky
characters where the breaks should be.  If you use a more sophisticated
text editor such as Notepad++ or Textpad, everything looks normal.
Python has automatic newline conversion; generally, you can read a text
file from any OS and write to it correctly regardless of the OS that you
happen to be running yourself.

However, the automatic newline handling (from my perfunctory Googling)
appears to break down when you're also converting between Unicode and
ASCII; or it could be because you're essentially doing a read() from one
file and a writelines() to the other; or something else entirely.
Anyway, try this - 

	import codecs

	inp = codecs.open('g:\\data\\amm\\text files\\test20090320.txt',
'r',
	'utf-16')
	outp = open('g:\\data\\amm\\text files\\new_text_file.txt', 'w')

	for outLine in inp:
	    outp.write(outLine.strip())

	inp.close()
	outp.close()

strip() will remove any leading or trailing whitespace - which should
include any leftover CR or LF characters.

HTH -

-- 
www.fsrtechnologies.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20090407/bb79cc34/attachment.htm>