How to waste computer memory?

Chris Angelico rosuav at gmail.com
Fri Mar 18 17:08:15 EDT 2016


On Sat, Mar 19, 2016 at 8:02 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
> Chris Angelico <rosuav at gmail.com>:
>> On Sat, Mar 19, 2016 at 2:26 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
>>> It may be that Python's Unicode abstraction is an untenable illusion
>>> because the underlying reality is 8-bit and there's no way to hide it
>>> completely.
>>
>> The underlying reality is 1-bit. Or maybe the underlying reality is
>> actually electrical signals that don't even have a clear definition of
>> "bits" and bounce between two states for a few fractions of a second
>> before settling. And maybe someone's implementing Python on the George
>> Banks Kite CPU, which consists of two cents' worth of paper and
>> string, on which text is actually represented by glyph. They're all
>> equally valid notions of "underlying reality".
>>
>> Text is an abstract concept, just as numbers are.
>
> The question is how tenable the illusion is. If the OS gave the
> appropriate guarantees (say, all pathnames are encoded Unicode strings),
> the abstraction could be maintained. Unfortunately, the legacy shines
> through making you wonder if Python has overreached prematurely with its
> Unicode HAL.

The problem is not Python's Unicode strings, then. The problem is the
notion that path names are text. If they're text, they should be
exclusively text (although, for low-level efficiency, they're more
likely to be defined as "valid UTF-8 sequences" rather than "sequences
of Unicode codepoints"); since they're not, they are fundamentally
bytes. But that's not a problem with Python - it's a problem with the
file system.

ChrisA



More information about the Python-list mailing list