Unicode chr(150) en dash

marexposed at googlemail.com marexposed at googlemail.com
Thu Apr 17 11:10:35 EDT 2008


Thank you Martin and John, for you excellent explanations.

I think I understand the unicode basic principles, what confuses me is the usage different applications make out of it.

For example, I got that EN DASH out of a web page which states <?xml version="1.0" encoding="ISO-8859-1"?> at the beggining. That's why I did go for that encoding. But if the browser can properly decode that character using that encoding, how come other applications can't?

I might need to go for python's htmllib to avoid this, not sure. But if I don't, if I only want to just copy and paste some web pages text contents into a tkinter Text widget, what should I do to succesfully make every single character go all the way from the widget and out of tkinter into a python string variable? How did my browser knew it should render an EN DASH instead of a circumflexed lowercase u?

This is the webpage in case you are interested, 4th line of first paragraph, there is the EN DASH: http://www.pagina12.com.ar/diario/elmundo/subnotas/102453-32303-2008-04-15.html

Thanks a lot.


On Wed, 16 Apr 2008 10:27:26 -0700
John Nagle <nagle at animats.com> wrote:

> marexposed at googlemail.com wrote:
> > Hello guys & girls
> > 
> > I'm pasting an "en dash"
> > (http://www.fileformat.info/info/unicode/char/2013/index.htm) character into
> > a tkinter widget, expecting it to be properly stored into a MySQL database.
> > 
> > I'm getting this error: 
> > *****************************************************************************
> >  Exception in Tkinter callback Traceback (most recent call last): File
> > "C:\Python24\lib\lib-tk\Tkinter.py", line 1345, in __call__ return
> > self.func(*args) File "chupadato.py", line 25, in guardar cursor.execute(a) 
> > File "C:\Python24\Lib\site-packages\MySQLdb\cursors.py", line 149, in execute
> >  query = query.encode(charset) UnicodeEncodeError: 'latin-1' codec can't
> > encode character u'\u2013' in position 52: ordinal not in range(256) 
> > *****************************************************************************
> 
>      Python and MySQL will do end to end Unicode quite well.  But that's
> not what you're doing.  How did "latin-1" get involved?
> 
>      If you want to use MySQL in Unicode, there are several things to be done.
> First, the connection has to be opened in Unicode:
> 
> 	db = MySQLdb.connect(host="localhost",
> 		use_unicode = True, charset = "utf8",
> 		user=username, passwd=password, db=database)
> 
> Yes, you have to specify both "use_unicode=True", which tells the client
> to talk Unicode, and set "charset" to"utf8", which tells the server
> to talk Unicode encoded as UTF-8".
> 
> Then the tables need to be in Unicode.  In SQL,
> 
>      ALTER DATABASE dbname DEFAULT CHARACTER SET utf8;
> 
> before creating the tables.  You can also change the types of
> existing tables and even individual fields to utf8, if necessary.
> (This takes time for big tables; the table is copied.  But it works.)
> 
>      It's possible to get MySQL to store character sets other than
> ASCII or Unicode; you can store data in "latin1" if you want. This
> might make sense if, for example, all your data is in French or German,
> which maps well to "latin1".  Unless that's your situation, go with
> either all-ASCII or all-Unicode.  It's less confusing.
> 
> 					John Nagle
> -- 
> http://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list