Flexible string representation, unicode, typography, ...

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Aug 31 10:54:42 EDT 2012


On Fri, 31 Aug 2012 08:43:55 -0400, Roy Smith wrote:

> In article <503f8e33$0$30001$c3e8da3$5496439d at news.astraweb.com>,
>  Steven D'Aprano <steve+comp.lang.python at pearwood.info> wrote:
> 
>> On Thu, 30 Aug 2012 07:02:24 -0400, Roy Smith wrote:
>> > Is the implementation smart enough to know that x == y is always
>> > False if x and y are using different internal representations?
>> 
>> [...] There may be circumstances where two strings have different
>> internal representations even though their content is the same
> 
> If there is a deterministic algorithm which maps string content to
> representation type, then I don't see how it's possible for two strings
> with different representation types to have the same content.  Could you
> give me an example of when this might happen?

There are deterministic algorithms which can result in the same result 
with two different internal formats. Here's an example from Python 2:

py> sum([1, 2**30, -2**30, 2**30, -2**30])
1
py> sum([1, 2**30, 2**30, -2**30, -2**30])
1L

The internal representation (int versus long) differs even though the sum 
is the same.

A second example: the order of keys in a dict is deterministic but 
unpredictable, as it depends on the history of insertions and deletions 
into the dict. So two dicts could be equal, and yet have radically 
different internal layout.

One final example: list resizing. Here are two lists which are equal but 
have different sizes:

py> a = [0]
py> b = range(10000)
py> del b[1:]
py> a == b
True
py> sys.getsizeof(a)
36
py> sys.getsizeof(b)
48


Is PEP 393 another example of this? I have no idea. Somebody who is more 
familiar with the details of the implementation would be able to answer 
whether or not that is the case. I'm just suggesting that it is possible.


-- 
Steven



More information about the Python-list mailing list