Simple Text Processing Help

patrick.waldo at gmail.com patrick.waldo at gmail.com
Tue Oct 16 08:45:01 EDT 2007


And now for something completely different...

I see a lot of COM stuff with Python for excel...and I quickly made
the same program output to excel.  What if the input file were a Word
document?  Where is there information about manipulating word
documents, or what could I add to make the same program work for word?

Again thanks a lot.  I'll start hitting some books about this sort of
text manipulation.

The Excel add on:

import codecs
import re
from win32com.client import Dispatch

path = "c:\\text_samples\\chem_1_utf8.txt"
path2 = "c:\\text_samples\\chem_2.txt"
input = codecs.open(path, 'r','utf8')
output = codecs.open(path2, 'w', 'utf8')

NR_RE = re.compile(r'^\d+-\d+-\d+$')           #pattern for EINECS
number

tokens = input.read().split()
def iter_elements(tokens):
    product = []
    for tok in tokens:
        if NR_RE.match(tok) and len(product) >= 4:
            product[2:-1] = [' '.join(product[2:-1])]
            yield product
            product = []
        product.append(tok)
    yield product

xlApp = Dispatch("Excel.Application")
xlApp.Visible = 1
xlApp.Workbooks.Add()
c = 1

for element in iter_elements(tokens):
    xlApp.ActiveSheet.Cells(c,1).Value = element[0]
    xlApp.ActiveSheet.Cells(c,2).Value = element[1]
    xlApp.ActiveSheet.Cells(c,3).Value = element[2]
    xlApp.ActiveSheet.Cells(c,4).Value = element[3]
    c = c + 1

xlApp.ActiveWorkbook.Close(SaveChanges=1)
xlApp.Quit()
xlApp.Visible = 0
del xlApp

input.close()
output.close()




More information about the Python-list mailing list