os.walk the apostrophe and unicode

Rod Person rodperson at rodperson.com
Sat Jun 24 15:37:52 EDT 2017


On Sat, 24 Jun 2017 21:28:45 +0200
Peter Otten <__peter__ at web.de> wrote:

> Rod Person wrote:
> 
> > Hi,
> > 
> > I'm working on a program that will walk a file system and clean the
> > id3 tags of mp3 and flac files, everything is working great until
> > the follow file is found
> > 
> > '06 - Todd's Song (Post-Spiderland Song in Progress).flac'
> > 
> > for some reason that I can't understand os.walk() returns this file
> > name as
> > 
> > '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in
> > Progress).flac'
> > 
> > which then causes more hell than a little bit for me. I'm not
> > understand why apostrophe(') becomes \xe2\x80\x99, or what I can do
> > about it.  
> 
> >>> b"\xe2\x80\x99".decode("utf-8")  
> '’'
> >>> unicodedata.name(_)  
> 'RIGHT SINGLE QUOTATION MARK'
> 
> So it's '’' rather than "'".
> 
> > The script is Python 3, the file system it is running on is a hammer
> > filesystem on DragonFlyBSD. The audio files reside on a QNAP NAS
> > which runs some kind of Linux so it probably ext3/4. The files came
> > from various system (Mac, Windows, FreeBSD).  
> 
> There seems to be a mismatch between the assumed and the actual file
> system encoding somewhere in this mix. Is this the only glitch or are
> there similar problems with other non-ascii characters?
> 

This is the only glitch as in file names so far.

-- 
Rod

http://www.rodperson.com

Who at Clitorius fountain thirst remove 
Loath Wine and, abstinent, meer Water love.

 - Ovid



More information about the Python-list mailing list