Iteration of strings

Bjarke Dahl Ebert bebert at tiscali.dk
Fri Nov 22 17:58:40 EST 2002


When writing small scripts in Python, I often make the same error:

Some function expects a list of strings, and does something like
    for elem in thestringlist: ...

Often, this function is somehow called with a string, instead of a list of
strings.
This is unfortunate, because in a sense, a string is also a list of strings!
I.e.,
    for x in strlist: print repr(x)

will print the eight strings:
    "A"
    " "
    "s"
    "t"
    "r"
    "i"
    "n"
    "g"

if I happen to pass "A string" in as parameter 'strlist'.


I started wondering if is anything to do about it. So I made a small
analysis of the ways in which a string looks like a container of strings.

 - len(X) will return the number of elements (which are also strings)
 - X[i] will return the i'th element - a string.
 - iter(X) will iterate over characters (i.e., strings).
 - and therefore, the 'for' construct will accept a string to iterate over.
 - In a sense, X[i:j] will even look like the list [X[i]...X[j-1]] of
strings.

Of these, I consider "iter(mystring)" to be the biggest problem in practice.
In many ways, 'ABC' works just like ['A', 'B', 'C'], when I would rather
think of 'ABC' as an atom.

Possible "fixes":
 - Make iter("abc") a TypeError. Since iter(X) iterates over anything that
implements the X[i] protocol, 'str' would probably have to implement the
iterator-protocol by raising a TypeError. The iter(str) feature could be
replaced by a more explicit str.iterchars() if one really wants to iterate
over the characters.
 - Or at least make iter(str) issue a warning (optionally)
 - (more radical:) Make a distinct char type with the following properties:
    - doesn't work in arithmetic expressions (because characters are not
numbers)
    - doesn't work in string concatenation, without explicit str(mychar)
   A string should then of course be a char container, instead of
effectively a container of strings.

I like the first best. The last has the drawback of still allowing
    for s in strlist: ..."%s"%s
 when strlist is a string.

I think the current situation makes the described script bug ("for x in
strlist:" when strlist is a string) pass too silently. Often it results in
code that "works", because a string is effectively also a list of strings.

What do you think?


Kind regards,
Bjarke








More information about the Python-list mailing list