I am trying to delete duplicates but the job just finishes with an exit code 0

Peter Otten __peter__ at web.de
Tue Nov 7 08:17:53 EST 2017


tysondogerz at gmail.com wrote:

> I am trying to delete duplicates but the job just finishes with an exit
> code 0 and does not delete any duplicates.
> 
> The duplicates for the data always exist in Column F and I am desiring to
> delete the entire row B-I
> 
> Any ideas?
> 
> 
> import openpyxl
> wb1 = openpyxl.load_workbook('C:/dwad/SWWA.xlsx')
> ws1 = wb1.active # keep naming convention consistent
>  
> values = []
> for i in range(2,ws1.max_row+1):
>   if ws1.cell(row=i,column=1).value in values:
>     #pass
>   #else:
>     values.append(ws1.cell(row=i,column=1).value)
>  
> for value in values:
>   ws1.append([value])

append() will add even more duplicates to the sheet. If you do not care 
about cell styles you can create a new sheet and copy only unique values. A 
complete example:


import openpyxl

SOURCE_FILE = "duplicates.xlsx"
DEST_FILE = "unique.xlsx"

HEADER_COUNT = 1
KEY_COLUMNS = [1]  # zero-based A=0, B=1, ...

workbook = openpyxl.load_workbook(SOURCE_FILE)

source_sheet = workbook.active
dest_sheet = workbook.create_sheet()

seen = set()
for i, row in enumerate(source_sheet.values):
    if i < HEADER_COUNT:
        dest_sheet.append(row)
    else:
        key = tuple(row[i] for i in KEY_COLUMNS)
        print("row = %r, key = %r" %(row, key))
        if key not in seen:
            print("adding row", row)
            seen.add(key)
            dest_sheet.append(row)

workbook.save(DEST_FILE)

> I have attempted to do this with openpyxl for an excel as well as other
> methods (including csv though this deleted rows excessively). 

I find that hard to believe. If anything it should keep more rows as you 
compare whole lines, not just the columns you are interested in.

> CSV:
> with open('1.csv','r') as in_file, open('2.csv','w') as out_file:
>     seen = set() # set for fast O(1) amortized lookup
>     for line in in_file:
>         if line not in seen:
>             seen.add(line)
>             out_file.write(line)

General remark: use the csv module in the standard library rather than 
trying to parse the records manually.




More information about the Python-list mailing list