ASCII and Unicode

rusi rustompmody at gmail.com
Sun Dec 8 12:39:47 EST 2013


On Sunday, December 8, 2013 10:52:34 PM UTC+5:30, Steven D'Aprano wrote:
> On Sat, 07 Dec 2013 17:05:34 +0100, giacomo boffi wrote:

> > Steven D'Aprano  writes:
> >> Ironically, your post was not Unicode.  [...] Your post was sent using
> >> a legacy encoding, Windows-1252, also known as CP-1252
> > i access rusi's post using a NNTP server, and in his post i see
> > Content-Type: text/plain; charset=UTF-8

> But *which post* are you looking at?

> I have just looked at three posts from him:

> Rusi's original post, where he used the ellipsis characters:

>   Subject: Re: Managing Google Groups headaches
>   Date: Thu, 5 Dec 2013 23:13:54 -0800 (PST)
>   Content-Type: text/plain; charset=windows-1252

> Then his reply to me:

>   Subject: Re: ASCII and Unicode [was Re: Managing Google Groups headaches]
>   Date: Fri, 6 Dec 2013 18:33:39 -0800 (PST)
>   Content-Type: text/plain; charset=UTF-8

> And finally, his reply to you:

>   Subject: Re: ASCII and Unicode
>   Date: Sun, 8 Dec 2013 08:41:10 -0800 (PST)
>   Content-Type: text/plain; charset=ISO-8859-1

> It seems to me that whatever client he is using to post (I believe it is 
> Google Groups web interface?) varies the encoding depending on what 
> characters are included in his post.

> > is it possible that what you see is an artifact of the gateway?

> I doubt it. Unfortunately the email mailing list archive doesn't display 
> all the email headers, but for the record here is his original post as 
> seen by the email mailing list:

> https://mail.python.org/pipermail/python-list/2013-December/661782.html

> If you view source, you'll see that Mailman (the mailing list software) 
> sets the webpage encoding to US-ASCII and encodes the ellipses to &#8230, 
> which is a perfectly reasonable thing for a web page to do. So we can be 
> confident that when Mailman saw Rusi's post, it was able to correctly 
> decode the message and see ellipses.

> Although I think that (probably) Google Groups is being stupid by varying 
> the charset (why not just use UTF-8 always?), at least it is setting the 
> charset correctly. 

I think GG is being being sweet and affectionate and delectable enough that a
💩 in the footer will keep it stuck at UTF-8 you think ?? :-)





More information about the Python-list mailing list