[Python-ideas] discontinue iterable strings

Chris Angelico rosuav at gmail.com
Sat Aug 20 09:53:51 EDT 2016


On Sat, Aug 20, 2016 at 10:31 PM, Michael Selik <michael.selik at gmail.com> wrote:
> On Sat, Aug 20, 2016 at 3:48 AM Chris Angelico <rosuav at gmail.com> wrote:
>>
>> On Sat, Aug 20, 2016 at 4:28 PM, Alexander Heger <python at 2sn.net> wrote:
>> > Yes, I am aware it will cause a lot of backward incompatibilities...
>>
>> Tell me, would you retain the ability to subscript a string to get its
>> characters?
>>
>> >>> "asdf"[0]
>> 'a'
>
>
> A separate character type would solve that issue. While Alexander Heger was
> advocating for a "monolithic object," and may in fact not want subscripting,
> I think he's more frustrated by the fact that iterating over a string gives
> other strings. If instead a 1-length string were a different, non-iterable
> type, that might avoid some problems.
>
> However, special-casing a character as a different type would bring its own
> problems. Note the annoyance of iterating over bytes and getting integers.
>
> In case it's not clear, I should add that I disagree with this proposal and
> do not want any change to strings.

Agreed. One of the handy traits of cross-platform code is that MANY
languages let you subscript a double-quoted string to get a
single-quoted character. Compare these blocks of code:

if ("asdf"[0] == 'a')
    write("The first letter of asdf is a.\n");

if ("asdf"[0] == 'a'):
    print("The first letter of asdf is a.")

if ("asdf"[0] == 'a')
    console.log("The first letter of asdf is a.")

if ("asdf"[0] == 'a')
    printf("The first letter of asdf is a.\n");

if ("asdf"[0] == 'a')
    echo("The first letter of asdf is a.\n");

Those are Pike, Python, JavaScript/ECMAScript, C/C++, and PHP,
respectively. Two of them treat single-quoted and double-quoted
strings identically (Python and JS). Two use double quotes for strings
and single quotes for character (aka integer) constants (Pike and C).
One has double quotes for interpolated and single quotes for
non-interpolated strings (PHP). And just to mess you up completely,
two (or three) of these define strings to be sequences of bytes (C/C++
and PHP, plus Python 2), two as sequences of Unicode codepoints
(Python and Pike), and one as sequences of UTF-16 code units (JS). But
in all five, subscripting a double-quoted string yields a
single-quoted character.

I'm firmly of the opinion that this should not change. Code clarity is
not helped by creating a brand-new "character" type and not having a
corresponding literal for it, and the one obvious literal, given the
amount of prior art using it, would be some form of quote character -
probably the apostrophe. Since that's not available, I think a
character type would be a major hurdle to get over.

ChrisA


More information about the Python-ideas mailing list