f python?

BartC bc at freeuk.com
Sun Apr 8 18:46:02 EDT 2012


"Kaz Kylheku" <kaz at kylheku.com> wrote in message
news:20120408114313.85 at kylheku.com...

> Worse, the one byte Unix mistake being covered is, disappointingly, just a
> clueless rant against null-terminated strings.
>
> Null-terminated strings are infinitely better than the ridiculous
> encapsulation of length + data.
>
> For one thing, if s is a non-empty null terminated string then, cdr(s) is
> also
> a string representing the rest of that string without the first character,
> where cdr(s) is conveniently defined as s + 1.

If strings are represented as (ptr,length), then a cdr(s) would have to
return (ptr+1,length-1), or (nil,0) if s was one character. No big deal.

(Note I saw your post in comp.lang.python; I don't about any implications of
that for Lisp.)

And if, instead, you want to represent all but the last character of the
string, then it's just (ptr,length-1). (Some checking is needed around empty
strings, but similar checks are needed around s+1.)

In addition, if you want to represent the middle of a string, then it's also
very easy: (ptr+a,b).

> Not only can compilers compress storage by recognizing that string
> literals are
> the suffixes of other string literals, but a lot of string manipulation
> code is
> simplified, because you can treat a pointer to interior of any string as a
> string.

Yes, the string "bart" also contains "art", "rt" and "t". But with counted
strintgs, it can also contain "bar", "ba", "b", etc....

There are a few advantages to counted strings too...

> length + data also raises the question: what type is the length field? One
> byte? Two bytes? Four?

Depends on the architecture. But 4+4 for 32-bits, and 8+8 bytes for 64-bits,
I would guess, for general flex strings of any length.

There are other ways of encoding a length.

(For example I use one short string type of maximum M characters, but the
current length N is encoded into the string, without needing any extra count
byte (by fiddling about with the last couple of bytes). If you're trying to
store a short string in an 8-byte field in a struct, then this will let you
use all 8 bytes; a zero-terminated one, only 7.)

> And then you have issues of byte order.

Which also affects every single value of more than one byte.

> Null terminated
> C strings can be written straight to a binary file or network socket and
> be
> instantly understood on the other end.

But they can't contains nulls!

> Null terminated strings have simplified all kids of text manipulation,
> lexical
> scanning, and data storage/communication code resulting in immeasurable
> savings over the years.

They both have their uses.

-- 
Bartc
 




More information about the Python-list mailing list