Exended ASCII and code pages [was Re: for / while else doesn't make sense]
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Wed May 25 06:19:57 EDT 2016
On Wednesday 25 May 2016 19:10, Christopher Reimer wrote:
> Back in the early 1980's, I grew up on 8-bit processors and latin-1 was all
> we had for ASCII.
It really, truly wasn't. But you can be forgiven for not knowing that, since
until the rise of the public Internet most people weren't exposed to more than
one code page or encoding, and it was incredibly common for people to call
*any* encoding "ASCII". (That's like calling any computer "an IBM", or any
soft-drink "Coke".)
But being an old Mac user from the 1980s, I'm very aware that DOS and Mac used
different character sets, although even I wasn't aware at the time that the DOS
character sets were internationalised with different versions of "extended
ASCII".
(That's how Anglo-centric I was in the 1980s: I honestly never gave a moment's
thought to the fact that, say, Greek computer users would like to be able to
type in Greek. I thought that while DOS users and Mac users had different
character sets, all DOS users had the same character set, and likewise for Mac
users.)
The first code pages were from IBM in the 1970s. Different countries had their
own national standards for "extended ASCII", as did different computer
manufacturers. Apple, Apricot, Atari, Commodore and other hardware
manufacturers used their own proprietary extensions. Due to the close
partnership between IBM and Microsoft, they kept their register of code pages
in sync until they fell out over OS/2 and NT. Since the 1990s, not so much.
The Wikipedia articles on "Code page", "Extended ASCII" etc are good for giving
a broad overview, but they lack a lot of the finer detail such as the years the
different standards were formally created and when they were first made
available as code pages on PCs. If you care about that sort of minutia, you
will have to go digging. But very broadly speaking, even in the 1980s there was
no shortage of extensions to ASCII. While the code page system was necessary at
the time, the legacy of them today continues to plague computer users, causing
moji-bake, errors on file systems[1], and holding back the adoption of Unicode.
[1] I'm speaking from experience there. Take files created on a Windows machine
using some legacy code page, and try to copy them to another server using
Unicode, and depending on the intelligence of the server, you may not be able
to copy them. On the flip side, there are many file names I can easily create
on Linux but cannot copy to a FAT file system.
--
Steve
More information about the Python-list
mailing list