f python?

Kaz Kylheku kaz at kylheku.com
Sun Apr 8 15:14:45 EDT 2012


On 2012-04-08, Peter J. Holzer <hjp-usenet2 at hjp.at> wrote:
> On 2012-04-08 17:03, David Canzi <dmcanzi at uwaterloo.ca> wrote:
>> If you added up the cost of all the extra work that people have
>> done as a result of Microsoft's decision to use '\' as the file
>> name separator, it would probably be enough money to launch the
>> Burj Khalifa into geosynchronous orbit.
>
> So we have another contender for the Most Expensive One-byte Mistake?

The one byte mistake in DOS and Windows is recognizing two characters as path
separators.  All code that correctly handles paths is complicated by having to
look for a set of characters instead of just scanning for a byte.

> http://queue.acm.org/detail.cfm?id=2010365

DOS backslashes are already mentioned in that page, but alas it perpetuates the
clueless myth that DOS and windows do not recognize any other path separator.

Worse, the one byte Unix mistake being covered is, disappointingly, just a
clueless rant against null-terminated strings.

Null-terminated strings are infinitely better than the ridiculous encapsulation of length + data. 

For one thing, if s is a non-empty null terminated string then, cdr(s) is also
a string representing the rest of that string without the first character,
where cdr(s) is conveniently defined as s + 1.

Not only can compilers compress storage by recognizing that string literals are
the suffixes of other string literals, but a lot of string manipulation code is
simplified, because you can treat a pointer to interior of any string as a
string.

Because they are recursively defined, you can do elegant tail recursion on null
terminated strings:

  const char *rec_strchr(const char *in, int ch)
  { 
    if (*in == 0)
      return 0;
    else if (*in == ch)
      return in;
    else
      return rec_strchr(in + 1, ch);
  }

length + data also raises the question: what type is the length field? One
byte? Two bytes? Four? And then you have issues of byte order.  Null terminated
C strings can be written straight to a binary file or network socket and be
instantly understood on the other end.

Null terminated strings have simplified all kids of text manipulation, lexical
scanning, and data storage/communication code resulting in immeasurable
savings over the years.



More information about the Python-list mailing list