os.walk the apostrophe and unicode

Peter Otten __peter__ at web.de
Sat Jun 24 17:17:07 EDT 2017


Rod Person wrote:

> On Sat, 24 Jun 2017 21:28:45 +0200
> Peter Otten <__peter__ at web.de> wrote:
> 
>> Rod Person wrote:
>> 
>> > Hi,
>> > 
>> > I'm working on a program that will walk a file system and clean the
>> > id3 tags of mp3 and flac files, everything is working great until
>> > the follow file is found
>> > 
>> > '06 - Todd's Song (Post-Spiderland Song in Progress).flac'
>> > 
>> > for some reason that I can't understand os.walk() returns this file
>> > name as
>> > 
>> > '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in
>> > Progress).flac'
>> > 
>> > which then causes more hell than a little bit for me. I'm not
>> > understand why apostrophe(') becomes \xe2\x80\x99, or what I can do
>> > about it.
>> 
>> >>> b"\xe2\x80\x99".decode("utf-8")
>> '’'
>> >>> unicodedata.name(_)
>> 'RIGHT SINGLE QUOTATION MARK'
>> 
>> So it's '’' rather than "'".
>> 
>> > The script is Python 3, the file system it is running on is a hammer
>> > filesystem on DragonFlyBSD. The audio files reside on a QNAP NAS
>> > which runs some kind of Linux so it probably ext3/4. The files came
>> > from various system (Mac, Windows, FreeBSD).
>> 
>> There seems to be a mismatch between the assumed and the actual file
>> system encoding somewhere in this mix. Is this the only glitch or are
>> there similar problems with other non-ascii characters?
>> 
> 
> This is the only glitch as in file names so far.
> 

Then I'd fix the name manually...




More information about the Python-list mailing list