Why isn't my re.sub replacing the contents of my MS Word file?

scottcabit at gmail.com scottcabit at gmail.com
Mon May 12 13:35:53 EDT 2014


On Friday, May 9, 2014 8:12:57 PM UTC-4, Steven D'Aprano wrote:

> Good:
> 
> 
> 
>     fStr = re.sub(b'&#x2012', b'-', fStr)
> 

  Doesn't work...the document has been verified to contain endash and emdash characters, but this does NOT replace them.
> 
> 
> Better:
> 
> 
> 
>     fStr = fStr.replace(b'&#x2012', b'-')
> 
> 
   Still doesn't work
> 
> 
> 
> But having said that, you actually can make use of the nuclear-powered 
> 
> bulldozer, and do all the replacements in one go:
> 
> 
> 
> Best:
> 
> 
> 
>     # Untested
> 
>     fStr = re.sub(b'&#x(201[2-5])|(2E3[AB])|(00[2A]D)', b'-', fStr)

  Still doesn't work.

  Guess whatever the code is for endash and mdash are not the ones I am using....




More information about the Python-list mailing list