UTF problem?

Sun May 25 14:33:43 EDT 2008

Vesa-Matti Sarenius wrote:
> I am trying to set up Linux printing via Windows XP and using this HOWTO:
> 
> http://justin.yackoski.name/winp/
> 
> I have one problem. When I send a file via CUPS to the windows spool
> directory dirwatch.py (http://justin.yackoski.name/winp/dirwatch.txt) dies
> and gives:
> 
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0'
> in position 77: ordinal not in range (128)
> 
> I think that this has something to do with UTF-8 what I do not know is how
> to get over it? 
> 
> So could someone please help me to get dirwatch.py script to understand my
> filenames (which, btw, do not have any non-ascii symbols, and will work
> if I manually copy-rename them).

Well you don't say what line it's dying on, but I would guess
it's the line which says "print full_filename" (altho' it
could be when it prints the output from gsprint.exe). In
any case, \xa0 is a non-breaking space:

<code>
import unicodedata

print unicodedata.name (u"\xa0")
</code>

so it's just possible that one of your filenames has
a non-ascii char in it without your noticing? (Altho'
I'd find that an odd character to include in a filename).

ReadDirectoryChangesW is the Unicode version of the
ReadDirectoryChanges API, so always returns Unicode objects
even if there are no non-ascii chars in the filenames.

To get round the problem in the immediate term, change
both print lines to something like:

print repr (full_filename)

This will show exactly what's in the filename. To do
the thing properly, you'll want to encode the filename
to something standard, eg, sys.stdout.encoding or just
"utf8":

print full_filename.encode ("utf8")

but since these are just info outputs I imagine you're
not too bothered how they look.

TJG