A Mountain of Perl Books + Python Advocacy
Remco Gerlich
scarblac-spamtrap at pino.selwerd.nl
Tue May 9 13:11:13 EDT 2000
tony summerfelt wrote in comp.lang.python:
> correct me if i'm wrong, but a duplicate of the last line is what's checked?
>
> the code i posted (typos and all) the duplicate items could have been
> anyhere in the file:
>
>
> one
> one
> two
> three
>
> or:
>
> one
> two
> three
> one
>
> the perl's hashes make that possible
Python's dictionaries are similar to Perl's hashes. What he wrote is what
Unix uniq does, I think. I couldn't read your original version, but if you
want to put every line into a hash so that you can find the uniques, that's
easy. (Oh wait, I think you put them into a list as well - to preserve
order? Let's do that to).
#!/usr/local/bin/python
import sys
try:
file1 = open(sys.argv[1],'r')
file2 = open(sys.argv[2],'w')
except:
print "Usage: unique.py file1 file2"
print
print "An error occurred. You didn't give the right arguments, or the"
print "files couldn't be opened."
sys.exit(1)
dict = {}
uniques = []
for line in file1.readlines():
if not dict.has_key(line):
dict[line] = 1
uniques.append(line)
file2.writelines(uniques)
This does slurp in the whole file at once. If you don't like that, replace
the 'for line in...' line with something like
while 1:
line = file1.readline()
if not line:
break
--
Remco Gerlich, scarblac at pino.selwerd.nl
-rwxr-xr-x 1 root 5.89824e37 Oct 22 1990 /usr/bin/emacs [STR]
More information about the Python-list
mailing list