Problem writing some strings (UnicodeEncodeError)

Paulo da Silva p_s_d_a_s_i_l_v_a at netcabo.pt
Sun Jan 12 12:51:46 EST 2014


Em 12-01-2014 16:23, Peter Otten escreveu:
> Paulo da Silva wrote:
> 
>> I am using a python3 script to produce a bash script from lots of
>> filenames got using os.walk.
>>
>> I have a template string for each bash command in which I replace a
>> special string with the filename and then write the command to the bash
>> script file.
>>
>> Something like this:
>>
>> shf=open(bashfilename,'w')
>> filenames=getfilenames() # uses os.walk
>> for fn in filenames:
>> ...
>> cmd=templ.replace("<fn>",fn)
>> shf.write(cmd)
>>
>> For certain filenames I got a UnicodeEncodeError exception at
>> shf.write(cmd)!
>> I use utf-8 and have # -*- coding: utf-8 -*- in the source .py.
>>
>> How can I fix this?
>>
>> Thanks for any help/comments.
> 
> You make it harder to debug your problem by not giving the complete 
> traceback. If the error message contains 'surrogates not allowed' like in 
> the demo below
> 
>>>> with open("tmp.txt", "w") as f:
> ...     f.write("\udcef")
> ... 
> Traceback (most recent call last):
>   File "<stdin>", line 2, in <module>
> UnicodeEncodeError: 'utf-8' codec can't encode character '\udcef' in 
> position 0: surrogates not allowed

That is the situation. I just lost it and it would take a few houres to
repeat the situation. Sorry.


> 
> you have filenames that are not valid UTF-8 on your harddisk. 
> 
> A possible fix would be to use bytes instead of str. For that you need to 
> open `bashfilename` in binary mode ("wb") and pass bytes to the os.walk() 
> call. 
This is my 1st time with python3, so I am confused!

As much I could understand it seems that os.walk is returning the
filenames exactly as they are on disk. Just bytes like in C.

My template is a string. What is the result of the replace command? Is
there any change in the filename from os.walk contents?

Now, if the result of the replace has the replaced filename unchanged
how do I "convert" it to bytes type, without changing its contents, so
that I can write to the bashfile opened with "wb"?


> 
> Or you just go and fix the offending names.
This is impossible in my case.
I need a bash script with the names as they are on disk.




More information about the Python-list mailing list