hex dump w/ or w/out utf-8 chars

Sun Jul 14 09:44:44 EDT 2013

Le dimanche 14 juillet 2013 12:44:12 UTC+2, Steven D'Aprano a écrit :
> On Sun, 14 Jul 2013 01:20:33 -0700, wxjmfauth wrote:
> 
> 
> 
> > For a very simple reason, the latin-1 block: considered and accepted
> 
> > today as beeing a Unicode design mistake.
> 
> 
> 
> Latin-1 (also known as ISO-8859-1) was based on DEC's "Multinational 
> 
> Character Set", which goes back to 1983. ISO-8859-1 was first published 
> 
> in 1985, and was in use on Commodore computers the same year.
> 
> 
> 
> The concept of Unicode wasn't even started until 1987, and the first 
> 
> draft wasn't published until the end of 1990. Unicode wasn't considered 
> 
> ready for production use until 1991, six years after Latin-1 was already 
> 
> in use in people's computers.
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Steven

------

"Unicode" (in fact iso-14xxx) was not created in one
night (Deus ex machina).

What's count today is this:

>>> timeit.repeat("a = 'hundred'; 'x' in a")
[0.11785943134991479, 0.09850454944486256, 0.09761604599423179]
>>> timeit.repeat("a = 'hundreœ'; 'x' in a")
[0.23955250303158593, 0.2195812612416752, 0.22133896997401692]
>>> 
>>> 
>>> sys.getsizeof('d')
26
>>> sys.getsizeof('œ')
40
>>> sys.version
'3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)]'

jmf