[Tutor] name shortening in a csv module output

Alan Gauld alan.gauld at btinternet.com
Fri Apr 24 01:33:57 CEST 2015


On 24/04/15 00:15, Jim Mooney wrote:
> Pretty much guesswork.
> Alan Gauld
> -- 
> This all sounds suspiciously like the old browser wars

Its more about history. Early text encodings all worked in a single byte 
which is
limited to 256 patterns. That's simply not enough to cover all the 
alphabets
around. So people developed their own encodings for every computer 
platform,
printer, county and combinations thereof. Unicode is an attempt to get away
from that, but the historical junk is still out there. And unicode is 
not yet
the de-facto standard.

Now the simple thing to do would be just have one enormous character
set that covers everything. That's Unicode 32 bit encoding. The snag is
that it takes 4 bytes for every character, which is a lot of 
space/bandwidth.
So more efficient encodings were introduced such as Unicode "16 bit" and
"8 bit", aka utf-8.

UTF-8 is becoming a standard but it has the complication that its a 
variable
width standard where a character can be anything from a single byte up
to 4 bytes long. The "rarer" the character the longer its encoding. And
unfortunately the nature of the coding is such that it looks a lot like
other historical encodings, especially some of the Latin ones. So you can
get something almost sane out of a text by decoding it as utf8 but not
quite totally sane. And it maybe some time into your usage before you
realise you are actually using the wrong code!

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos



More information about the Tutor mailing list