[pypy-issue] Issue #2997: ''.join(somestring) is buggy in presence of non-ascii characters (pypy/pypy)
Antonio Cuni
issues-reply at bitbucket.org
Fri Apr 12 13:00:15 EDT 2019
New issue 2997: ''.join(somestring) is buggy in presence of non-ascii characters
https://bitbucket.org/pypy/pypy/issues/2997/join-somestring-is-buggy-in-presence-of
Antonio Cuni:
The following snippet prints a weird result on the latest pypy3 (nightly):
```
#-*- encoding: utf-8 -*-
def dump(s):
print(" len():", len(s))
print(" repr():", repr(s))
print(" chars:", [ord(ch) for ch in s])
x = "a = 'à'"
y = ''.join(x)
print("x == y: ", x == y)
print("x:")
dump(x)
print()
print("y: ")
dump(y)
```
```
$ ./pypy3 foo.py
x == y: True
x:
len(): 7
repr(): "a = 'à'"
chars: [97, 32, 61, 32, 39, 224, 39]
y:
len(): 8
repr(): "a = 'à'"
chars: [97, 32, 61, 32, 39, 224, 39, 208]
``
Note that `x==y` even if they differ in length, and note that y has an extra char (208) which is not printed by repr(). 208 seems to be non-deterministic, so I suppose it is caused by an off-by-one error which causes someone to read past the string.
This is the ultimate cause of the `\u0000` reported by issue #2983
More information about the pypy-issue
mailing list