MS Word -- finding text

Mike Brenner mikeb at mitre.org
Sun Jun 16 13:03:35 EDT 2002


The COM objects (like Project, Word, Excel, etc.) sometimes return stuff in Unicode format. When they do, the python str() function dies when converting non-ASCII unicode characters. 

To avoid this problem, I use the following conversion routine. After making the necessary check for None, it attempts a quick conversion str() first. When necessary, it slowly goes through each character, handling the exceptions that are raised. 

The default is a prime because that is the most common character that hits me in Word and Excel documents. Instead of coding it as an ASCII single-quote characters, these applications code it as a more "beautiful" character, so it kills the python str() function.

You may or may not wish to change the return to eliminate the string.strip there, depending on your needs.

You could make a separate function that has just the TRY and the EXCEPT in it, in order to use the MAP function instead of the for loop.

Mike Brenner


def phrase_unicode2string(message):
    """
    phrase_unicode2string works around the built-in function str(message)
    which aborts when non-ASCII unicode characters are given to it.
    """
    if type(message)==types.NoneType:
       return ""
    try: st=str(message)
    except: # untranslatable unicode character
       list=[]
       for uc in message:
          try:
             c=str(uc)
          except:
             c="`"
          list.append(c)
       # Note: because it raises exception instead of returning
       # a default characters, we cannot use map() here.
       st=string.join(list,"")
    return string.strip(st)

------------------------

Mike Prema wrote: 

#######
from win32com.client import Dispatch
W=Dispatch('Word.Application')
D=W.Documents.Open('c:\\windows\\Desktop\\TOR.doc') ## Test Doc
FindRange=D.Content
F=FindRange.Find.Execute('Conman','True')
print FindRange.Text
#######
str() doesn't seem to work in this case
I tried using the codecs library but I think I am missing something






More information about the Python-list mailing list