Characters aren't displayed correctly

John Machin sjmachin at lexicon.net
Mon Mar 2 10:40:54 EST 2009


On Mar 3, 1:50 am, Hussein B <hubaghd... at gmail.com> wrote:
> On Mar 2, 4:31 pm, John Machin <sjmac... at lexicon.net> wrote:> On Mar 2, 7:30 pm, Hussein B <hubaghd... at gmail.com> wrote:
>
> > > On Mar 1, 4:51 pm, Philip Semanchuk <phi... at semanchuk.com> wrote:
>
> > > > On Mar 1, 2009, at 8:31 AM, Hussein B wrote:
>
> > > > > Hey,
> > > > > I'm retrieving records from MySQL database that contains non english
> > > > > characters.
>
> > Can you reveal which language???
>
> Arabic
>
>
>
> > > > > Then I create a String that contains HTML markup and column values
> > > > > from the previous result set.
> > > > > +++++
> > > > > markup = u'''<table>.....'''
> > > > > for row in rows:
> > > > >     markup = markup + '<tr><td>' + row['id']
> > > > > markup = markup + '</table>
> > > > > +++++
> > > > > Then I'm sending the email according to this tip:
> > > > >http://code.activestate.com/recipes/473810/
> > > > > Well, the email contains ????? characters for each non english ones.
> > > > > Any ideas?
>
> > > > There's so many places where this could go wrong and you haven't  
> > > > narrowed down the problem.
>
> > > > Are the characters stored in the database correctly?
>
> > > Yes they are.
>
> > How do you KNOW that they are stored correctly? What makes you so
> > sure?
>
> Because MySQL Query Browser displays them correctly, in addition I use
> BIRT as the reporting system and it shows them correctly.
>
>
>
> > > > Are they stored consistently (i.e. all using the same encoding, not  
> > > > some using utf-8 and others using iso-8859-1)?
>
> > > Yes.
>
> > So what is the encoding used to store them?
>
> Tables are created with UTF-8 encoding option
>
> > > > What are you getting out of the database? Is it being converted to  
> > > > Unicode correctly, or at all?
>
> > > I don't know, how to make sure of this point?
>
> > You could show us some of the output from the database query. As well
> > as
> >    print the_output
> > you should
> >    print repr(the_output)
> > and show us both, and also tell us what you *expect* to see.
>
> The result of print repr(row['name']) is '??? ??????'
> The '?' characters are supposed to be Arabic characters.

Are you expecting 3 Arabic characters, a space, and then 6 Arabic
characters?

We now have some interesting evidence: row['name'] is NOT a unicode
object -- otherwise the print would show u'??? ??????'; it's a str
object.

So: A utf8-encoded string is being decoded to unicode, and then re-
encoded to some other encoding, using the "replace" (with "?") error-
handling method. That shouldn't be hard to spot! It's about time you
showed us the code you are using to extract the data from the
database, including the print statements you have put in.




More information about the Python-list mailing list