Python dos2unix one liner

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sat Feb 27 06:44:30 EST 2010


On Sat, 27 Feb 2010 10:36:41 +0100, @ Rocteur CC wrote:

> cat file.dos | python -c "import sys,re;
> [sys.stdout.write(re.compile('\r\n').sub('\n', line)) for line in
> sys.stdin]" >file.unix

Holy cow!!!!!!! Calling a regex just for a straight literal-to-literal 
string replacement! You've been infected by too much Perl coding!

*wink*

Regexes are expensive, even in Perl, but more so in Python. When you 
don't need the 30 pound sledgehammer of regexes, use lightweight string 
methods.

import sys; sys.stdout.write(sys.stdin.read().replace('\r\n', '\n'))

ought to do it. It's not particularly short, but Python doesn't value 
extreme brevity -- code golf isn't terribly exciting in Python.

[steve at sylar ~]$ cat -vet file.dos
one^M$
two^M$
three^M$
[steve at sylar ~]$ cat file.dos | python -c "import sys; sys.stdout.write
(sys.stdin.read().replace('\r\n', '\n'))" > file.unix
[steve at sylar ~]$ cat -vet file.unix
one$
two$
three$
[steve at sylar ~]$

Works fine. Unfortunately it still doesn't work in-place, although I 
think that's probably a side-effect of the shell, not Python. To do it in 
place, I would pass the file name:

# Tested and working in the interactive interpreter.
import sys
filename = sys.argv[1]
text = open(filename, 'rb').read().replace('\r\n', '\n')
open(filename, 'wb').write(text)


Turning that into a one-liner isn't terribly useful or interesting, but 
here we go:

python -c "import sys;open(sys.argv[1], 'wb').write(open(sys.argv[1], 
'rb').read().replace('\r\n', '\n'))" file

Unfortunately, this does NOT work: I suspect it is because the file gets 
opened for writing (and hence emptied) before it gets opened for reading. 
Here's another attempt:

python -c "import sys;t=open(sys.argv[1], 'rb').read().replace('\r\n', 
'\n');open(sys.argv[1], 'wb').write(t)" file


[steve at sylar ~]$ cp file.dos file.txt
[steve at sylar ~]$ python -c "import sys;t=open(sys.argv[1], 'rb').read
().replace('\r\n', '\n');open(sys.argv[1], 'wb').write(t)" file.txt
[steve at sylar ~]$ cat -vet file.txt
one$
two$
three$
[steve at sylar ~]$ 


Success!

Of course, none of these one-liners are good practice. The best thing to 
use is a dedicated utility, or write a proper script that has proper 
error testing.


> Is there a better way in Python or is this kind of thing best done in
> Perl ?

If by "this kind of thing" you mean text processing, then no, Python is 
perfectly capable of doing text processing. Regexes aren't as highly 
optimized as in Perl, but they're more than good enough for when you 
actually need a regex.

If you mean "code golf" and one-liners, then, yes, this is best done in 
Perl :)


-- 
Steven



More information about the Python-list mailing list