os.walk the apostrophe and unicode

Rod Person rodperson at rodperson.com
Sun Jun 25 08:19:02 EDT 2017


Ok...so after reading all the replies in the thread, I thought I would
be easier to send a general reply and include some links to screenshots.

As Peter mention, the logic thing to do would be to fix the file name
to what I actually thought it was and if this was for work that
probably what I would have done, but since I want to understand what's
going on I decided to waste time on that.

I have to admit, I didn't think the file system was utf-8 as seeing what
looked to be an apostrophe sent me down the road of why is this
apostrophe screwed up instead of "ah this must be unicode".

But doing a simple ls of that directory show it is unicode but the
replacement of the offending character.

http://rodperson.com/graphics/uc/ls.png

I am in fact using Python 3.5. I may be lacking in unicode skills but I
do have the sense enough to know the version of Python I am invoking.
So I included this screenshot of that so the version of Python and the
files list returned by os.walk

http://rodperson.com/graphics/uc/files.png

So the fact that it shows as a string and not bytes in the debugger was
throwing me for a loop, in my log section I was trying to determine if
it was unicode decode it...if not don't do anything which wasn't working

http://rodperson.com/graphics/uc/log_section.png




On Sun, 25 Jun 2017 10:47:18 +0200
Peter Otten <__peter__ at web.de> wrote:

> Steve D'Aprano wrote:
> 
> > On Sun, 25 Jun 2017 04:57 pm, Peter Otten wrote:  
> 
> >> if everything worked correctly? Though I don't understand why the
> >> OP doesn't see
> >> 
> >> '06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac'
> >> 
> >> which is the repr() that I get.  
> > 
> > That's mojibake and is always wrong :-)   
> 
> Yes, that's my very point. 
> 
> > I'm not sure how you got that.  
> 
> I took the OP's string at face value and pasted it into the
> interpreter:
> 
> # python 3.4
> >>> '06 - Todd\xe2\x80\x99s Song (Post-Spiderland Song in
> >>> Progress).flac'  
> '06 - Toddâ\x80\x99s Song (Post-Spiderland Song in Progress).flac'
> 
> > Something to do with an accidental decode to Latin-1?  
> 
> If the above filename is the only one or one of a few that seem
> broken, and other non-ascii filenames look OK the OP's
> toolchain/filesystem may work correctly and the odd name might have
> been produced elsewhere, e. g. by copying an already messed-up
> freedb.org entry.
> 
> [Heureka]
> 
> However, the most likely explanation is that the filename is correct
> and that the OP is not using Python 3 as he claims but Python 2.
> 
> Yes, it took that long for me to realise ;) Python 2 is slowly
> sinking into oblivion...
> 



-- 
Rod

http://www.rodperson.com



More information about the Python-list mailing list