inclusive-lower-bound, exclusive-upper-bound (was Re: Range Operation pre-PEP)

Greg Jorgensen greg at pdxperts.com
Sun May 20 22:19:01 EDT 2001


I'll continue the off-topic C pedantry because Python programmers who want
to extend Python in C need to know this stuff.

On 19 May 2001, Marcin 'Qrczak' Kowalczyk wrote:

> > char s[] = "literal string" is preferred unless s really needs to
> > change to point at something else at run time.
>
> I disagree: if it's a local variable, the whole string must be
> copied into it, because the compiler can't assume that we won't
> modify individual characters. If the string is immutable, it's better
> to define
>     char *s = "literal string";
> so it doesn't need to be copied.

Actually if the string is immutable (and you're using an ANSI C or C++
compiler), you should define the string as:

    const char *s = "literal string";

I wrote char s[] = initializer is preferred because the array definition
actually allocates memory, but the pointer initialization may just set the
pointer to the address of a shared literal. Although C allows it, the
result of changing the characters pointed to by an initialized pointer
defined and legitimate.

Too many C programmers don't understand the concept of shared literals; I
can't say how many times I've seen code that modified a string literal
through a pointer. When the compiler pools literals you get subtle and
strange bugs. A particularly insidious problem occurs when the standard
library function strtok() is applied to a pointer to a literal, because
strtok modifies the string it is parsing. Because array definitions
actually allocate memory for the array contents, modifying the contents of
an array can't affect another copy of the string literal used to
initialize another array. Consistent use of const will prevent almost all
of these bugs, and in fact the standard library functions are declared
with const arguments to indicate what they do and encourage programmers to
use const themselves. (None of this is specific to character string
literals; C requires that all initializers be constant expressions, which
are evaluated at compile time, so the compiler may pool/share integer and
floating-point literals as well).

The compiler can't assume individual characters won't be modified
regardless of whether an array or a pointer is defined with an
initializer. In C, nothing prevents:

    char *s = "literal string";
    ...
    *s = 'L';

This is a good example of where arrays and pointers SEEM equivalent but
really aren't. In fact K&R mentions the distinction on page 104:

"There is an important difference between these definitions:

    char amessage[] = "now is the time";    /* an array */
    char *pmessage = "now is the time";     /* a pointer */

amessage is an array, just big enough to hold the sequence of characters
as '\0' that initializes it. Individual characters within the array may be
changed but amessage will always refer to the same storage. On the other
hand, pmessage is a pointer, initialized to point to a string constant;
the pointer may subsequently be modified to point elsewhere, but the
result is undefined if you try to modify the string contents."
-- The C Programming Language, Second Edition, by Kernighan and Ritchie

As for whether literal strings are copied to an array or not, or how
automatic (local) arrays or pointers are initialized, that's an
implementation of the compiler; the C language standard doesn't define how
those features are implemented.

Greg Jorgensen
PDXperts LLC
Portland, Oregon, USA
gregj at pobox.com





More information about the Python-list mailing list