How to copy paragraphs (with number formatting) and images from Words (.docx) and paste into Excel (.xlsx) using Python

A S aishan0403 at gmail.com
Sun Mar 22 10:47:38 EDT 2020


I have contract clauses in Words (.docx) format that needs to be frequently copy and pasted into Excel (.xlsx) to be sent to the third party. The clauses are often updated hence there's always a need to copy and paste these clauses over. I only need to copy and paste all the paragraphs and images after the contents page. Here is a sample of the Clause document (https://drive.google.com/open?id=1ZzV29R6y2q0oU3HAVrqsFa158OhvpxEK).

I have tried doing up a code using Python to achieve this outcome. Here is the code that I have done so far:

!pip install python-docx
import docx
import xlsxwriter

document = docx.Document("Clauses Sample.docx")
wb = xlsxwriter.Workbook('C:/xxxx/xxxxxx/xxxx/clauses sample.xlsx')

docText = []
index_row = 0
Sheet1 = wb.add_worksheet("Sheetttt")

for paragraph in document.paragraphs:
    if paragraph.text:
        docText.append(paragraph.text)
        xx = '\n'.join(docText)

        Sheet1.write(index_row,0, xx)

        index_row = index_row+1

wb.close()        
#print(xx) 
However, my Excel file output looks like this:

I can't seem to paste pictures into this discussion so please see both my current and desired Excel output here:

https://stackoverflow.com/questions/60800494/how-to-copy-paragraphs-with-number-formatting-and-images-from-words-docx-an


More information about the Python-list mailing list