Replacing "illegal characters" in html

Robert Brewer fumanchu at amor.org
Sun May 9 16:12:22 EDT 2004


> BenO wrote:
> > I'm new to python and need to write a function to replace 
> > certain characters
> > in a string (html).
> > 
> > The characters I need to replace come from MS Word copy & 
> > paste and are:
> > 
> > ' (Left quote)
> > ' (Right quote)
> > Double Left quotes
> > Double Right quotes
> > 
> > Can anyone help me or point me in the right direction on an 
> > efficient way of doing this?

And I answered:

> The two methods most often used are 1) the .replace method of strings,
> and 2) regular expressions.
> 
> 1) The .replace method:
> 
> >>> replacemap = {""": '"', """: '"', "'": "'", "'": "'"}
> >>> map(ord, replacemap.keys())
> [145, 147, 146, 148]
> >>> test = ""hl" 'oh'"
> >>> for k, v in replacemap.iteritems():
> ... 	test = test.replace(k, v)
> ... 	
> >>> test
> '"hl" \'oh\''
> 
> 2) Regular Expressions:
> 
> >>> import re
> >>> test = ""hl" 'oh'"
> >>> test = re.sub("[""]", '"', test)
> >>> test = re.sub("['']", "'", test)
> >>> test
> '"hl" \'oh\''

But of course, the email gateway munged all the quotes. So here's a
better version:

1) The .replace method:

>>> replacemap = {"\x91": '"', "\x93": '"', "\x92": "'", "\x94": "'"}
>>> map(ord, replacemap.keys())
[145, 147, 146, 148]
>>> test = "\x91hl\x93 \x92oh\x94"
>>> for k, v in replacemap.iteritems():
... 	test = test.replace(k, v)
... 	
>>> test
'"hl" \'oh\''

2) Regular Expressions:

>>> import re
>>> test = "\x91hl\x93 \x92oh\x94"
>>> test = re.sub("[\x91\x93]", '"', test)
>>> test = re.sub("[\x92\x94]", "'", test)
>>> test
'"hl" \'oh\''

Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org
 




More information about the Python-list mailing list