Why isn't my re.sub replacing the contents of my MS Word file?

Rustom Mody rustompmody at gmail.com
Mon May 12 23:00:49 EDT 2014


On Monday, May 12, 2014 11:05:53 PM UTC+5:30, scott... at gmail.com wrote:
> On Friday, May 9, 2014 8:12:57 PM UTC-4, Steven D'Aprano wrote:
> >     fStr = fStr.replace(b'&#x2012', b'-')
> 
>    Still doesn't work
> 
> 
> > Best:
> > 
> > 
> >     # Untested
> > 
> >     fStr = re.sub(b'&#x(201[2-5])|(2E3[AB])|(00[2A]D)', b'-', fStr)
> 
>   Still doesn't work.
> 
>   Guess whatever the code is for endash and mdash are not the ones I am using....

What happens if you divide two string?
>>> 'a' / 'b'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for /: 'str' and 'str'

Or multiply 2 lists?

>>> [1,2]*[3,3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'list'

Trying to do a text operation like re.sub on a NON-text object like a doc-file
is the same.

Yes python may not be intelligent enough to give you such useful error messages
outside its territory ie on contents of random files, however logically its the
same -- an impossible operation.


The options you have:
1. Use doc-specific tools eg MS/Libre office to work on doc files ie dont use python
2. Follow Tim Golden's suggestion, ie use win32com which is a doc-talking
python API [BTW Thanks Tim for showing how easy it is]
3. Get out of the doc format to txt (export as plain txt) and then try what you 
are trying on the txt



More information about the Python-list mailing list