How to replace characters in a string?

Barry Scott barry at barrys-emacs.org
Wed Jun 8 13:12:41 EDT 2022



> On 8 Jun 2022, at 18:01, Dave <dave at looktowindward.com> wrote:
> 
> Hi,
> 
> This is a tool I’m using on my own files to save me time. Basically or most of the tracks were imported with different version iTunes over the years. There are two problems:
> 
> 1.   File System characters are replaced (you can’t have ‘/‘ or ‘:’ in a file name).
ok
> 2.   Smart Quotes were added at some point, these need to replaced.
ok
> 3.   Other character based of name being of a non-english origin.
Why is this a problem? Its only if the chars are confusing/will not compare that there is something to fix?
All modern OS allow unicode filenames.

Barry


> 
> If find others I’ll add them.
> 
> I’m using MusicBrainz to do a fuzzy match and get the correct name.
> 
> it’s not perfect, but works for 99% of files which is good enough for me!
> 
> Cheers
> Dave
> 
> 
>> On 8 Jun 2022, at 18:23, Avi Gross via Python-list <python-list at python.org> wrote:
>> 
>> Dave,
>> 
>> Your goal is to compare titles and there can be endless replacements needed if you allow the text to contain anything but ASCII.
>> 
>> Have you considered stripping out things instead? I mean remove lots of stuff that is not ASCII in the first place and perhaps also remove lots of extra punctuation likesingle quotes or question marks or redundant white space and compare the sort of skeletons of the two? 
>> 
>> And even if that fails, could you have a measure of how different they are and tolerate if they were say off by one letter albeit "My desert" matching "My Dessert" might not be a valid match with one being a song about an arid environment and the other about food you don't need!
>> 
>> Your seemingly simple need can expand into a fairly complex project. There may be many ideas on how to deal with it but not anything perfect enough to catch all cases as even a trained human may have to make decisions at times and not match what other humans do. We have examples like the TV show "NUMB3RS" that used a perfectly valid digit 3 to stand for an "E" but yet is often written when I look it up as NUMBERS. You have obvious cases where titles of songs may contain composite symbols like "œ" which will not compare to one where it is written out as "oe" so the idea of comparing is quite complex and the best you might do is heuristic.
>> 
>> UNICODE has many symbols that are almost the same or even look the same or maybe in one font versus another. There are libraries of functions that allow some kinds of comparisons or conversions that you could look into but the gain for you may not be worth it. Nothing stops a person from naming a song any way they want and I speak many languages and often see a song re-titled in the local language and using the local alphabet mixed often with another.
>> 
>> Your original question is perhaps now many questions, depending on what you choose. You started by wanting to know how to compare and it is moving on to how to delete parts or make substitutions or use regular expressions and it can get worse. You can, for example, take a string and identify the words within it and create a regular expression that inserts sequences between the words that match any zero or one or more non-word characters such as spaces, tabs, punctuation or non-ASCII, so that song titles with the same words in a sequence match no matter what is between them. The possibilities are endless but consider some of the techniques that are used by some programs that parse text and suggest alternate spellings  or even programs like Google Translate that can take a sentence and then suggest you may mean a slightly altered sentence with one word changed to fit better. 
>> 
>> You need to decide what you want to deal with and what will be mis-classified by your program. Some of us have suggested folding the case of the words but that means asong about a dark skinned person in Poland called "Black Polish" would match a song about keeping your shoes dark with "black polish" so I keep repeating it is very hard or frankly impossible, to catch every case I can imagine and the many I can't!
>> 
>> But the emphasis here is not your overall problem. It is about whether and how the computer language called python, and perhaps some add-on modules, can be used to solve each smaller need such as recognizing a pattern or replacing text. It can do quite a bit but only when the specification of the problem is exact. 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Dave <dave at looktowindward.com>
>> To: python-list at python.org
>> Sent: Wed, Jun 8, 2022 5:09 am
>> Subject: Re: How to replace characters in a string?
>> 
>> Hi,
>> 
>> Thanks for this! 
>> 
>> So, is there a copy function/method that returns a MutableString like in objective-C? I’ve solved this problems before in a number of languages like Objective-C and AppleScript.
>> 
>> Basically there is a set of common characters that need “normalizing” and I have a method that replaces them in a string, so:
>> 
>> myString = [myString normalizeCharacters];
>> 
>> Would return a new string with all the “common” replacements applied.
>> 
>> Since the following gives an error :
>> 
>> myString = 'Hello'
>> myNewstring = myString.replace(myString,'e','a’)
>> 
>> TypeError: 'str' object cannot be interpreted as an integer
>> 
>> I can’t see of a way to do this in Python? 
>> 
>> All the Best
>> Dave
>> 
>> 
>>> On 8 Jun 2022, at 10:14, Chris Angelico <rosuav at gmail.com> wrote:
>>> 
>>> On Wed, 8 Jun 2022 at 18:12, Dave <dave at looktowindward.com> wrote:
>>> 
>>>> I tried the but it doesn’t seem to work?
>>>> myCompareFile1 = ascii(myTitleName)
>>>> myCompareFile1.replace("\u2019", "'")
>>> 
>>> Strings in Python are immutable. When you call ascii(), you get back a
>>> new string, but it's one that has actual backslashes and such in it.
>>> (You probably don't need this step, other than for debugging; check
>>> the string by printing out the ASCII version of it, but stick to the
>>> original for actual processing.) The same is true of the replace()
>>> method; it doesn't change the string, it returns a new string.
>>> 
>>>>>> word = "spam"
>>>>>> print(word.replace("sp", "h"))
>>> ham
>>>>>> print(word)
>>> spam
>>> 
>>> ChrisA
>>> -- 
>>> https://mail.python.org/mailman/listinfo/python-list
>> 
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
>> -- 
>> https://mail.python.org/mailman/listinfo/python-list
> 
> -- 
> https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list