kdialog and unicode

Wed Apr 27 16:58:14 EDT 2005

John Machin wrote:
> On 26 Apr 2005 19:16:25 -0700, dmbkiwi at gmail.com wrote:
>
> >
> >John Machin wrote:
> >> On 26 Apr 2005 13:39:26 -0700, dmbkiwi at gmail.com (dumbkiwi) wrote:
> >>
> >> >Peter Otten <__peter__ at web.de> wrote in message
> >news:<d4l92e$9di$05$1 at news.t-online.com>...
> >> >> Dumbkiwi wrote:
> >> >>
> >> >> >> Just encode the data in the target encoding before passing
it
> >to
> >> >> >> os.popen():
> >>
> >> >
> >> >Anyway, from your post, I've done some more digging, and found
the
> >> >command:
> >> >
> >> >sys.setappdefaultencoding()
> >> >
> >> >which I've used, and it's fixed the problem (I think).
> >> >
> >>
> >> Dumb Kiwi, eh? Maybe not so dumb -- where'd you find
> >> sys.setappdefaultencoding()? I'm just a dumb Aussie [1]; I looked
in
> >> the 2.4.1 docs and also did import sys; dir(sys) and I can't spot
it.
> >
> >Hmmm. See post above, seems to be something generated by eric3.  So
> >this may not be the fix I'm looking for.
> >
> >>
> >> In any case, how could the magical sys.setappdefaultencoding() fix
> >> your problem? From your description, your problem appeared to be
that
> >> you didn't know what encoding to use.
> >
> >I knew what encoding to use,
>
> Would you mind telling us (a) what that encoding is (b) how you came
> to that knowledge (c) why you just didn't do

(a)  utf-8
(b)  I asked the author of the text, and it displays properly in other
parts of the script when not using kdialog.  Is there a way to test it
otherwise - I presume that there is.

>
> test = os.popen('kdialog --inputbox %s'
> %(data.encode('that_encoding')))
>
> instead of
>
> test = os.popen('kdialog --inputbox %s' %(data.encode('utf-8')))

Because, "that_encoding" == "utf-8" (as far as I was aware).

>
> > the problem was that the text was being
> >passed to kdialog as ascii.
>
> It wasn't being passed to kdialog; there was an attempt which failed.

Quite right.

>
> >  The .encode('utf-8') at least allows
> >kdialog to run, but the text still looks like crap.  Using
> >sys.setappdefaultencoding() seemed to help.  The text looked a bit
> >better - although not entirely perfect - but I think that's because
the
> >font I was using didn't have the correct characters (they came up as
> >square boxes).
>
> And the font you *were* using is what? And the font you are now using
> is what? What facilities do you have to use different fonts?

The font I was using was bitstream vera sans.  The font I'm now using
is verdana.

>
> >>
> >> What is the essential difference between
> >>
> >>    send(u_data.encode('polish'))
> >>
> >> and
> >>
> >>    sys.setappdefaultencoding('polish')
> >>    ...
> >>    send(u_data)
> >
> >Not sure - I'm new to character encoding, and most of this seems
like
> >black magic to me.
>
> The essential difference is that setting a default encoding is a daft
> idea.
>
Because it acheives nothing more than what I can do with
.encode('that_encoding')?

>
> >
> >>
> >> [1]: Now that's *TWO* contenders for TautologyOTW :-)
> >>
>
> Before I retract that back to one contender, I'll give it one more
> shot:
>
Aaah, there's nothing better than a bit of cheerful snarkiness on a
newsgroup.

> 1. Your data: you say it is Polish text, and is utf-8. This implies
> that it is in Unicode, encoded as utf-8. What evidence do you have?

See above.

> Have you been able to display it anywhere so that it "looks good"?

Yes.  What I am doing here is a theme for a superkaramba widget (see
http://netdragon.sourceforge.net).  It displays fine everywhere else on
the widget, it's just in the kdialog boxes that it doesn't display
correctly.

> If it's not confidential, can you show us a dump of the first say 100
> bytes of text, in an unambiguous form, like this:

Can't do it now, because I'm at work.  I can do it when I get home
tonight.

>
> print repr(open('polish.text', 'rb').read(100))
>
> 2. Your script: You say "I then manipulate the data to break it down
> into text snippets" - uh-huh ... *what* manipulations? Care to tell
> us? Care to show us the code?

Manipulation is simply breaking the text down into dictionary pairs.
It is basically a translation file for my widget, with English text,
and a corresponding Posish text.  I use the re module to parse the
file, and create dictionary pairs between the English text, and the
corresponding Polish text.

>
> 3. kdialog: I know nothing of KDE and its toolkit. I would expect
> either (a) it should take utf-8 and be able to display *any* of the
> first 64K (nominal) Unicode characters, given a Unicode font or (b)
> you can encode your data in a legacy charset, *AND* tell it what that
> charset is, and have a corresponding font or (c) you have both
> options. Which is correct, and what are the details of how you can
> tell kdialog what to do -- configuration? command-line arguments?

That's what I was hoping someone here might be able to tell me.  Having
searched on line, I cannot find any information about kdialog and
encoding.  I have left a message on the relevant kde mailing list, but
have had no response.  The command line options are found with kdialog
--help, but as you don't have kde, it will be difficult for you to look
at those.  Having examined them at length, there is no option for
encoding.
> 
> HTHYTHYS,
> 
> John

Thanks for your help and interest.

Matt