os.walk the apostrophe and unicode

John Ladasky john_ladasky at sbcglobal.net
Sat Jun 24 15:20:44 EDT 2017


On Saturday, June 24, 2017 at 12:07:05 PM UTC-7, Rod Person wrote:
> Hi,
> 
> I'm working on a program that will walk a file system and clean the id3
> tags of mp3 and flac files, everything is working great until the
> follow file is found
> 
> '06 - Todd's Song (Post-Spiderland Song in Progress).flac'
> 
> for some reason that I can't understand os.walk() returns this file
> name as
> 
> '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in Progress).flac'
> 
> which then causes more hell than a little bit for me. I'm not
> understand why apostrophe(') becomes \xe2\x80\x99, or what I can do
> about it.

That's a "right single quotation mark" character in Unicode.

http://unicode.scarfboy.com/?s=E28099

Something in your code is choosing to interpret the text variable as an old-fashioned byte array of characters, where every character is represented by a single byte.  That works as long as the file name only uses characters from the old ASCII set, but there are only 128 of those.

> The script is Python 3, the file system it is running on is a hammer
> filesystem on DragonFlyBSD. The audio files reside on a QNAP NAS which
> runs some kind of Linux so it probably ext3/4. The files came from
> various system (Mac, Windows, FreeBSD).

Since you are working in Python3, you have the ability to call the .encode() and .decode() methods to translate between Unicode and byte character arrays (which you still need on occasion).


> 
> -- 
> Rod
> 
> http://www.rodperson.com




More information about the Python-list mailing list