surprise - byte in set

Gary Herron gherron at digipen.edu
Sat Jan 3 15:05:49 EST 2015


On 01/03/2015 10:50 AM, patrick vrijlandt wrote:
> Hello list,
>
> Let me first wish you all the best in 2015!
>
> Today I was trying to test for occurrence of a byte in a set ...
>
> >>> sys.version
> '3.4.2 (v3.4.2:ab2c023a9432, Oct  6 2014, 22:15:05) [MSC v.1600 32 bit 
> (Intel)]'
> >>> 'b' in 'abc'
> True
> >>> b'b' in b'abc'
> True
> >>> 'b' in set('abc')
> True
> >>> b'b' in set(b'abc')
> False
>
> I was surprised by the last result. What happened?
> (Examples simplified; I was planning to manipulate the set)

The surprise is really that the 3rd test is True not that the fourth is 
False.

First, as should be expected, a byte string is a sequence of (small) 
ints.  So b'b' is a (short) byte string and the set set(b'abc') is 
composed of three ints.  You should not expect your inclusion test to 
return True when testing for a bytes-type object in a set of int-type 
objects.  And that explains your False result in the 4th test.

 >>> type(b'abc')
<class 'bytes'>
 >>> type(b'abc'[0])
<class 'int'>


But things are different for strings.  You might think a string is a 
sequence of characters, but Python does not have a character type. In 
fact the elements of a string are just 1 char long strings:

 >>> type('abc')
<class 'str'>
 >>> type('abc'[0])
<class 'str'>

You would not logically expect to find a string 'b' in a set of 
characters in, say C++,  where the two types are different.  But that's 
not the Python way.  In Python a set of characters set('abc') is really 
a set of (short) strings, and the character 'b' is really a (short) 
string, so the inclusion test works.

Python's way of returning a 1-byte string when indexing a string 
(instead of returning an element of type character) allows this 
surprising result.

 >>> 'abc'[0]
'a'
 >>> 'abc'[0][0]
'a'
 >>> 'abc'[0][0][0]
'a'
 >>> 'abc'[0][0][0][0]
'a'
...


I've never considered this a problem, but a infinitely indexable object 
*is* a bit of an oddity.




>
> Patrick
>
> ---
> Dit e-mailbericht is gecontroleerd op virussen met Avast 
> antivirussoftware.
> http://www.avast.com
>


-- 
Dr. Gary Herron
Department of Computer Science
DigiPen Institute of Technology
(425) 895-4418




More information about the Python-list mailing list