Working with the set of real numbers

Chris Angelico rosuav at gmail.com
Wed Feb 12 08:20:32 EST 2014


On Wed, Feb 12, 2014 at 11:48 PM, Marko Rauhamaa <marko at pacujo.net> wrote:
> Chris Angelico <rosuav at gmail.com>:
>
>> Hmm, I'm not sure that my statement is false. If a computer can work
>> with "real numbers", then I would expect it to be able to work with
>> any real number. In C, I can declare an 'int' variable, which can hold
>> the real number 4 - does that mean that that variable stores real
>> numbers? No, and it's not useful to say that it does. It doesn't store
>> rationals either, even though 4 is a rational. The fact that computers
>> can work with some subset of real numbers does not disprove my
>> statement that computers don't work with "real numbers" as a class.
>> Program X works with text files, but it fails if the file contains
>> U+003C; can I feed it this thing, which is a text file? No, I can't,
>> because it works only with a subset of text files.
>
> According to your definition, there's no computer in the world that can
> work with integers or text files.

Integers as far as RAM will allow, usually (which is the same caveat
as is used when describing a programming language as "Turing complete"
- strictly, that term is valid only if it has infinite memory
available), but yes, technically that's a subset of integers. However,
that subset is bounded by something other than the code, algorithms,
or even hardware - it's theoretically possible to add two numbers
larger than will fit in memory, by reading them in (even over the
network), adding segments, and writing them out again.

Text files. Since there's already no such thing as a "text file"
unless you know what its encoding is, I don't see a problem with this.
There's no such thing as an integer in memory, either, unless you know
how it's encoded (those same bits could be a floating point number, or
a pointer, or anything). If you know that the bytes in the file are,
say, a UTF-8 stream, then the file is a text file, just as it could be
a bash script, or an MS-DOS .COM file, if you've been told to decode
it in that way. Once your encoding is declared (out of band), the file
consists of a series of ASCII characters, or Unicode codepoints, or
whatever else it is. A fully functional program should be able to
process that file regardless of what sequence of codepoints it
carries. Say you want to search a file for a particular string, for
instance. You want to know whether or not "foobar" occurs in a file.
(I'll leave aside the question of word boundaries and say you're
looking for that string of six characters.) The program should be able
to determine the presence or absence of "foobar" regardless of what
other characters (or codepoints) are around it. Having U+001A
shouldn't stop the search there; nor should U+0000 cause problems, nor
U+003C, nor any other value. Doing otherwise would be a restriction:
this program supports only a subset of text files (those not
containing these "problem characters"). It might not be a bug, per se
(maybe text inside <angle_brackets> is considered to be an XML tag and
is deemed to be not what you're looking for), but it's still a
restriction. An inability to represent the integer 9007199254740993
(but able to represent ...992 and ...994) is a restriction.
Restrictions aren't necessarily bad, but they need to be acknowledged.

ChrisA



More information about the Python-list mailing list