python 2.7 and unicode (one more time)

Steven D'Aprano steve+comp.lang.python at pearwood.info
Mon Nov 24 17:56:00 EST 2014


Marko Rauhamaa wrote:

>> Py3's byte strings are still strings, though.
> 
> Hm. I don't think so. In a plain English sense, maybe, but that kind of
> usage can lead to confusion.

Only if you are determined to confuse yourself.

People are quite capable of interpreting correctly sentences like:

"My friend Susan and I were talking about Jenny, and she said that she had
had a horrible fight with her boyfriend and was breaking up with him."

and despite the ambiguity correctly interpret who "she" and "her" refers to
each time. Compared to that, correctly understanding the mild complexity
of "string" is trivial.

In Python usage, "string" always refers to the `str` type, unless prefixed
with "byte", in which case it refers to the immutable byte-string type
(`str` in Python 2, `bytes` in Python 3.)

"Unicode string" always refers to the immutable Unicode string type
(`unicode` in Python 2, `str` in Python 3).

"Text string" is more ambiguous. Some people consider the prefix to be
redundant, e.g. "text string" always refers to `str`, while others consider
it to be in opposition to "byte string", i.e. to be a synonym for "Unicode
string".

In all cases apart from an explicit "byte string", the word "string" is
always used for the native array-of-characters type delimited by plain
quotation marks, as used for error messages, user prompts, etc., regardless
whether the implementation is an array of 8-bit bytes (as used by Python
2), or the full Unicode character set (as used by Python 3). So in
practice, provided you know which version of Python is being discussed,
there is never any genuine ambiguity when using the word "string" and no
excuse for confusion.


-- 
Steven




More information about the Python-list mailing list