extracting duplicates from CSV file by specific fields

MRAB google at mrabarnett.plus.com
Tue Apr 28 21:12:12 EDT 2009


VP wrote:
> Hi,
> I have a csv file:
> 
> 'aaa.111', 'T100', 'pn123', 'sn111'
> 'aaa.111', 'T200', 'pn123', 'sn222'
> 'bbb.333', 'T300', 'pn123', 'sn333'
> 'ccc.444', 'T400', 'pn123', 'sn444'
> 'ddd', 'T500', 'pn123', 'sn555'
> 'eee.666', 'T600', 'pn123', 'sn444'
> 'fff.777', 'T700', 'pn123', 'sn777'
> 
> How can I extract duplicates checking each row by filed1 and filed4?
> 
> I should get something like that:
> 
> 'aaa.111', 'T100', 'pn123', 'sn111'
> 'bbb.333', 'T300', 'pn123', 'sn333'
> 'ccc.444', 'T400', 'pn123', 'sn444'
> 'ddd', 'T500', 'pn123', 'sn555'
> 'fff.777', 'T700', 'pn123', 'sn777'
> 
> and
> 
> 'aaa.111', 'T200', 'pn123', 'sn222'
> 'eee.666', 'T600', 'pn123', 'sn444'
> 
> Any help will be extremely appreciated.
> 
Use the csv module, and when you're reading build a set of the values
you've already seen in field 1 and a set of the values you've already
seen in field 4 so you can check whether you've seen a row before.



More information about the Python-list mailing list