Python usage numbers

Sat Feb 11 23:38:37 EST 2012

On Sun, Feb 12, 2012 at 1:36 PM, Rick Johnson
<rantingrickjohnson at gmail.com> wrote:
> On Feb 11, 8:23 pm, Steven D'Aprano <steve
> +comp.lang.pyt... at pearwood.info> wrote:
>> "I have a file containing text. I can open it in an editor and see it's
>> nearly all ASCII text, except for a few weird and bizarre characters like
>> £ © ± or ö. In Python 2, I can read that file fine. In Python 3 I get an
>> error. What should I do that requires no thought?"
>>
>> Obvious answers:
>
> the most obvious answer would be to read the file WITHOUT worrying
> about asinine encoding.

What this statement misunderstands, though, is that ASCII is itself an
encoding. Files contain bytes, and it's only what's external to those
bytes that gives them meaning. The famous "bush hid the facts" trick
with Windows Notepad shows the folly of trying to use internal
evidence to identify meaning from bytes.

Everything that displays text to a human needs to translate bytes into
glyphs, and the usual way to do this conceptually is to go via
characters. Pretending that it's all the same thing really means
pretending that one byte represents one character and that each
character is depicted by one glyph. And that's doomed to failure,
unless everyone speaks English with no foreign symbols - so, no
mathematical notations.

ChrisA