Exended ASCII and code pages [was Re: for / while else doesn't make sense]

Steven D'Aprano steve+comp.lang.python at pearwood.info
Wed May 25 06:19:57 EDT 2016


On Wednesday 25 May 2016 19:10, Christopher Reimer wrote:

> Back in the early 1980's, I grew up on 8-bit processors and latin-1 was all
> we had for ASCII.

It really, truly wasn't. But you can be forgiven for not knowing that, since 
until the rise of the public Internet most people weren't exposed to more than 
one code page or encoding, and it was incredibly common for people to call 
*any* encoding "ASCII". (That's like calling any computer "an IBM", or any 
soft-drink "Coke".)

But being an old Mac user from the 1980s, I'm very aware that DOS and Mac used 
different character sets, although even I wasn't aware at the time that the DOS 
character sets were internationalised with different versions of "extended 
ASCII". 

(That's how Anglo-centric I was in the 1980s: I honestly never gave a moment's 
thought to the fact that, say, Greek computer users would like to be able to 
type in Greek. I thought that while DOS users and Mac users had different 
character sets, all DOS users had the same character set, and likewise for Mac 
users.)

The first code pages were from IBM in the 1970s. Different countries had their 
own national standards for "extended ASCII", as did different computer 
manufacturers. Apple, Apricot, Atari, Commodore and other hardware 
manufacturers used their own proprietary extensions. Due to the close 
partnership between IBM and Microsoft, they kept their register of code pages 
in sync until they fell out over OS/2 and NT. Since the 1990s, not so much.

The Wikipedia articles on "Code page", "Extended ASCII" etc are good for giving 
a broad overview, but they lack a lot of the finer detail such as the years the 
different standards were formally created and when they were first made 
available as code pages on PCs. If you care about that sort of minutia, you 
will have to go digging. But very broadly speaking, even in the 1980s there was 
no shortage of extensions to ASCII. While the code page system was necessary at 
the time, the legacy of them today continues to plague computer users, causing 
moji-bake, errors on file systems[1], and holding back the adoption of Unicode.





[1] I'm speaking from experience there. Take files created on a Windows machine 
using some legacy code page, and try to copy them to another server using 
Unicode, and depending on the intelligence of the server, you may not be able 
to copy them. On the flip side, there are many file names I can easily create 
on Linux but cannot copy to a FAT file system.


-- 
Steve




More information about the Python-list mailing list