Why isn't my re.sub replacing the contents of my MS Word file?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri May 9 20:12:57 EDT 2014


On Fri, 09 May 2014 12:51:04 -0700, scottcabit wrote:

> Hi,
> 
>  here is a snippet of code that opens a file (fn contains the path\name)
>  and first tried to replace all endash, emdash etc characters with
>  simple dash characters, before doing a search.
>   But the replaces are not having any effect. Obviously a syntax
>   problem....wwhat silly thing am I doing wrong?

You're making the substitution, then throwing the result away.

And you're using a nuclear-powered bulldozer to crack a peanut. This is 
not a job for regexes, this is a job for normal string replacement.

> fn = 'z:\Documentation\Software'
> def processdoc(fn,outfile):
>     fStr = open(fn, 'rb').read()
>     re.sub(b'&#x2012','-',fStr)

Good:

    fStr = re.sub(b'&#x2012', b'-', fStr)

Better:

    fStr = fStr.replace(b'&#x2012', b'-')


But having said that, you actually can make use of the nuclear-powered 
bulldozer, and do all the replacements in one go:

Best:

    # Untested
    fStr = re.sub(b'&#x(201[2-5])|(2E3[AB])|(00[2A]D)', b'-', fStr)


If you're going to unload the power of regexes, unload them on something 
that makes it worthwhile. Replacing a constant, fixed string with another 
constant, fixed string does not require a regex.



-- 
Steven D'Aprano
http://import-that.dreamwidth.org/



More information about the Python-list mailing list