How to replace characters in a string?

Jon Ribbens jon+usenet at unequivocal.eu
Wed Jun 8 06:26:59 EDT 2022


On 2022-06-08, Dave <dave at looktowindward.com> wrote:
> I misunderstood how it worked, basically I’ve added this function:
>
> def filterCommonCharacters(theString):
>     myNewString = theString.replace("\u2019", "'")
>     return myNewString

> Which returns a new string replacing the common characters.
>
> This can easily be extended to include other characters as and when
> they come up by adding a line as so:
>
>     myNewString = theString.replace("\u2014", “]”  #just an example
>
> Which is what I was trying to achieve.

Here's a head-start on some characters you might want to translate,
mostly spaces, hyphens, quotation marks, and ligatures:

    def unicode_translate(s):
        return s.translate({
            8192: ' ', 8193: ' ', 8194: ' ', 8195: ' ', 8196: ' ',
            8197: ' ', 198: 'AE', 8199: ' ', 8200: ' ', 8201: ' ',
            8202: ' ', 8203: '', 64258: 'fl', 8208: '-', 8209: '-',
            8210: '-', 8211: '-', 8212: '-', 8722: '-', 8216: "'",
            8217: "'", 8220: '"', 8221: '"', 64256: 'ff', 160: ' ',
            64260: 'ffl', 8198: ' ', 230: 'ae', 12288: ' ', 173: '',
            497: 'DZ', 498: 'Dz', 499: 'dz', 64259: 'ffi', 8230: '...',
            64257: 'fi', 64262: 'st'})

If you want to go further then the Unidecode package might be helpful:

    https://pypi.org/project/Unidecode/



More information about the Python-list mailing list