Delete duplicate rows in textfile - except it contains a "{" or "}"

Mark Lawrence breamoreboy at yahoo.co.uk
Wed Oct 10 05:28:24 EDT 2012


On 10/10/2012 09:51, Joon Ki Choi wrote:
>
> Hello Pythonistas,
>
> i have a very large textfile with contents like:
>
> @INBOOK{Ackermann1999-b,
>    author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann,
> 	K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
> 	and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
> 	Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann,
> 	K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
> 	and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
> 	Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann,
> 	K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F.
> 	and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and
> 	Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann},
>    year = {1980},
>    timestamp = {1995-12-02}
> }	
>
> And i want to delete the duplicate rows except these rows containing the brackets { or }.
> The result should look like:
>
> @INBOOK{Ackermann1999-b,
>    author = {Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann,
> 	Ackermann, K.-F. and Ackermann, K.-F. and Ackermann, K.-F. and Ackermann},
>    year = {1980},
>    timestamp = {1995-12-02}
> }
>
> I come across with this Python-Skript:
>
> lines_seen = set() # holds lines already seen
> outfile = open("literatur_clean.txt", "w")

Slight aside, you could use this so there's no need to explicitly close 
the file.

with open("literatur_dupl.txt", "r") as infile

> for line in infile:
>      if line not in lines_seen: # not a duplicate
>          outfile.write(line)
>          lines_seen.add(line)

Something like:-

if "{" in line or "}" in line or line not in lines_seen:

> outfile.close()
>
> But it deletes also the lines with a closing bracket } and the lines with the same authordata.
> Therefor i need the condition of the brackets.
>
> Could someone point me out to adding this condition?
>
> Thanks in advance,
> Joon
>

-- 
Cheers.

Mark Lawrence.




More information about the Python-list mailing list