os.walk the apostrophe and unicode

Rod Person rodperson at rodperson.com
Sun Jun 25 11:14:53 EDT 2017


On Sun, 25 Jun 2017 08:18:45 -0600
Michael Torrie <torriem at gmail.com> wrote:

> On 06/25/2017 06:19 AM, Rod Person wrote:
> > But doing a simple ls of that directory show it is unicode but the
> > replacement of the offending character.
> > 
> > http://rodperson.com/graphics/uc/ls.png  
> 
> Now that is really strange.  Your OS seems to not recognize that the
> filename is in UTF-8.  I suspect this has something to do with the NAS
> file sharing protocol (smb). Though I'm pretty sure that Samba can
> handle UTF-8 filenames correctly.
> 
> > I am in fact using Python 3.5. I may be lacking in unicode skills
> > but I do have the sense enough to know the version of Python I am
> > invoking. So I included this screenshot of that so the version of
> > Python and the files list returned by os.walk
> > 
> > http://rodperson.com/graphics/uc/files.png  
> 
> If I create a file that has the U+2019 character in it on my Linux
> machine (BtrFS), and do os.walk on it, I see the character in then
> string properly.  So it looks like Python does the right thing,
> automatically decoding from UTF-8.
> 
> In your situation I think the problem is the file sharing protocol
> that your NAS is using. Somehow some information is being lost and
> your OS does not know that the filenames are in UTF-8, and just
> thinks they are bytes. And therefore Python doesn't know to decode
> the string, so you just end up with each byte being converted to a
> unicode code point and being shoved into the unicode string.
> 
> How to get around this issue I don't know.  Maybe there's a way to
> convert the unicode string to bytes using the value of each character,
> and then decode that back to unicode.

I think you theory is on the correct path. I'm actually attached to the
NAS via NFS not samba. And just quickly looking into that it seems the
NFS server needs and option set to pass unicode correctly...but my NAS
software doesn't allow my access to settings only to turn it on or off.

Looks like my option is the original correct the file name.


-- 
Rod

http://www.rodperson.com

Who at Clitorius fountain thirst remove 
Loath Wine and, abstinent, meer Water love.

 - Ovid



More information about the Python-list mailing list