Delete duplicate rows in textfile - except it contains a "{" or "}"

Joon Ki Choi joon.ch at gmail.com
Wed Oct 10 04:51:16 EDT 2012


Hello Pythonistas,

i have a very large textfile with contents like:

@INBOOK{Ackermann1999-b,
  author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann,
	K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
	and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
	Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann,
	K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
	and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
	Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann,
	K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
	and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
	Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann},
  year = {1980},
  timestamp = {1995-12-02}
}	

And i want to delete the duplicate rows except these rows containing the brackets { or }. 
The result should look like:

@INBOOK{Ackermann1999-b,
  author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann,
	Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann},
  year = {1980},
  timestamp = {1995-12-02}
}

I come across with this Python-Skript:

lines_seen = set() # holds lines already seen
outfile = open("literatur_clean.txt", "w")
for line in open("literatur_dupl.txt", "r"):
    if line not in lines_seen: # not a duplicate
        outfile.write(line)
        lines_seen.add(line)
outfile.close()

But it deletes also the lines with a closing bracket } and the lines with the same authordata.
Therefor i need the condition of the brackets.

Could someone point me out to adding this condition?

Thanks in advance,
Joon







More information about the Python-list mailing list