[Tutor] questions on encoding

Albert-Jan Roskam fomcl at yahoo.com
Wed Jul 20 20:49:19 CEST 2011


Hi,

I am looking for test data with accented and multibyte characters. I have found a good resource that I could use to cobble something together (http://www.inter-locale.com/whitepaper/learn/learn-to-test.html) but I was hoping somebody knows some ready resource.

I also have some questions about encoding. In the code below, is there a difference between unicode() and .decode?
s = "§ÇǼÍÍ"
x = unicode(s, "utf-8")
y = s.decode("utf-8")
x == y # returns True

Also, is it, at least theoretically, possible to mix different encodings in byte strings? I'd say no, unless there are multiple BOMs or so. Not that I'd like to try this, but it'd improve my understanding of this sort of obscure topic.


Cheers!!

Albert-Jan



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110720/00d9b207/attachment-0001.html>


More information about the Tutor mailing list