urlencode with high characters

Jim jhefferon at smcvt.edu
Wed Nov 2 15:31:01 EST 2005


Hello,

I'm trying to do urllib.urlencode() with unicode correctly, and I
wonder if some kind person could set me straight?

My understanding is that I am supposed to be able to urlencode anything
up to the top half of latin-1 -- decimal 128-255.

I can't just send urlencode a unicode character:

Python 2.3.5 (#2, May  4 2005, 08:51:39)
[GCC 3.3.5 (Debian 1:3.3.5-12)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> s=u'abc'+unichr(246)+u'def'
>>> dct={'x':s}
>>> urllib.urlencode(dct)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "/usr/lib/python2.3/urllib.py", line 1206, in urlencode
    v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in
position 3: ordinal not in range(128)

Is it instead Right that I should send a unicode string to urlencode by
first encoding it to 'latin-1' ?

>>> import urllib
>>> s=u'abc'+unichr(246)+u'def'
>>> dct={'x':s.encode('latin-1')}
>>> urllib.urlencode(dct)
'x=abc%F6def'

If it is Right, I'm puzzled as to why urlencode doesn't do it.  Or am I
missing something?  urllib.ulrencode() contains the lines:

  elif _is_unicode(v):
                # is there a reasonable way to convert to ASCII?
                # encode generates a string, but "replace" or "ignore"
                # lose information and "strict" can raise UnicodeError
                v = quote_plus(v.encode("ASCII","replace"))
                l.append(k + '=' + v)

so I think that it is *not* liking latin-1.

Thank you,
Jim




More information about the Python-list mailing list