f python?

Kaz Kylheku kaz at kylheku.com
Mon Apr 9 14:55:28 EDT 2012


On 2012-04-09, Shmuel Metz <spamtrap at library.lspace.org.invalid> wrote:
> In <20120408114313.85 at kylheku.com>, on 04/08/2012
>    at 07:14 PM, Kaz Kylheku <kaz at kylheku.com> said:
>
>>Null-terminated strings are infinitely better than the ridiculous
>>encapsulation of length + data.
>
> ROTF,LMAO!
>
>>For one thing, if s is a non-empty null terminated string then,
>>cdr(s) is also a string representing the rest of that string 
>>without the first character,
>
> Are you really too clueless to differentiate between C and LISP?

In Lisp we can burn a list literal like '(a b c) into ROM, and compute (b c)
without allocating any memory.

Null-terminated C strings do the same thing.

In some Lisp systems, in fact, "CDR coding" was used to save space when
allocating a list all at once. This created something very similar to
a C string: a vector-like object of all the CARs, with a terminating
convention marking the end.

It's logically very similar.

I need not repeat the elegant recursion example for walking a C string.

That example is not possible with the length + data representation.
(Not without breaking the encapsulation and passing the length as a separate
recursion parameter to a recursive routine that works with the raw data part of
the string.)

>>Null terminated strings have simplified all kids of text
>>manipulation, lexical scanning, and data storage/communication 
>>code resulting in immeasurable savings over the years.
>
> Yeah, especially code that needs to deal with lengths and nulls.

To get the length of a string, you call a function, in either representation,
so it is not any more complicated from a coding point of view. The function is,
of course, more expensive if the string is null terminated, but you can code
with awareness of this and not call length wastefully.

If all else was equal (so that the expense of the length operation were
the /only/ issue) then of course the length + data would be better.

However, all else is not equal.

One thing that is darn useful, for instance, is that
p + strlen(p) still points to a string which is length zero, and this
sort of thing is widely exploited in text processing code. e.g.

   size_t digit_prefix_len = strspn(input_string, "0123456789");
   const char *after_digits = input-string + digit_prefix_len;

   if (*after_digits == 0) {
     /* string consists only of digits: nothing after digits */
   } else {
     /* process part after digits */
   }

It's nice that after_digits is a bona-fide string just like input_string,
without any memory allocation being required.

We can lexically analyze a string without ever asking it what its length is,
and as we march down the string, the remaining suffix of that string is always
a string so we can treat it as one, recurse on it, whatever.

Code that needs to deal with null "characters" is manipulating binary data, not
text, and should use a suitable data structure for that.

> It's great for buffer overruns too.

If we scan for a null terminator which is not there, we have a buffer overrun.

If a length field in front of string data is incorrect, we also have a buffer
overrrun.

A pattern quickly emerges here: invalid, corrupt data produced by buggy code
leads to incorrect results, and behavior that is not well-defined!



More information about the Python-list mailing list