Simple Text Processing Help

patrick.waldo at gmail.com patrick.waldo at gmail.com
Tue Oct 16 08:47:51 EDT 2007


And now for something completely different...

I've been reading up a bit about Python and Excel and I quickly told
the program to output to Excel quite easily.  However, what if the
input file were a Word document?  I can't seem to find much
information about parsing Word files.  What could I add to make the
same program work for a Word file?

Again thanks a lot.

And the Excel Add on...

import codecs
import re
from win32com.client import Dispatch

path = "c:\\text_samples\\chem_1_utf8.txt"
path2 = "c:\\text_samples\\chem_2.txt"
input = codecs.open(path, 'r','utf8')
output = codecs.open(path2, 'w', 'utf8')

NR_RE = re.compile(r'^\d+-\d+-\d+$')           #pattern for EINECS
number

tokens = input.read().split()
def iter_elements(tokens):
    product = []
    for tok in tokens:
        if NR_RE.match(tok) and len(product) >= 4:
            product[2:-1] = [' '.join(product[2:-1])]
            yield product
            product = []
        product.append(tok)
    yield product

xlApp = Dispatch("Excel.Application")
xlApp.Visible = 1
xlApp.Workbooks.Add()
c = 1

for element in iter_elements(tokens):
    xlApp.ActiveSheet.Cells(c,1).Value = element[0]
    xlApp.ActiveSheet.Cells(c,2).Value = element[1]
    xlApp.ActiveSheet.Cells(c,3).Value = element[2]
    xlApp.ActiveSheet.Cells(c,4).Value = element[3]
    c = c + 1

xlApp.ActiveWorkbook.Close(SaveChanges=1)
xlApp.Quit()
xlApp.Visible = 0
del xlApp

input.close()
output.close()




More information about the Python-list mailing list