Why isn't my re.sub replacing the contents of my MS Word file?
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Fri May 9 20:12:57 EDT 2014
On Fri, 09 May 2014 12:51:04 -0700, scottcabit wrote:
> Hi,
>
> here is a snippet of code that opens a file (fn contains the path\name)
> and first tried to replace all endash, emdash etc characters with
> simple dash characters, before doing a search.
> But the replaces are not having any effect. Obviously a syntax
> problem....wwhat silly thing am I doing wrong?
You're making the substitution, then throwing the result away.
And you're using a nuclear-powered bulldozer to crack a peanut. This is
not a job for regexes, this is a job for normal string replacement.
> fn = 'z:\Documentation\Software'
> def processdoc(fn,outfile):
> fStr = open(fn, 'rb').read()
> re.sub(b'‒','-',fStr)
Good:
fStr = re.sub(b'‒', b'-', fStr)
Better:
fStr = fStr.replace(b'‒', b'-')
But having said that, you actually can make use of the nuclear-powered
bulldozer, and do all the replacements in one go:
Best:
# Untested
fStr = re.sub(b'&#x(201[2-5])|(2E3[AB])|(00[2A]D)', b'-', fStr)
If you're going to unload the power of regexes, unload them on something
that makes it worthwhile. Replacing a constant, fixed string with another
constant, fixed string does not require a regex.
--
Steven D'Aprano
http://import-that.dreamwidth.org/
More information about the Python-list
mailing list