[Python-bugs-list] [ python-Bugs-405227 ] sizeof(Py_UNICODE)==2 ????
noreply@sourceforge.net
noreply@sourceforge.net
Sun, 17 Jun 2001 12:57:23 -0700
Bugs item #405227, was updated on 2001-03-01 11:21
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=405227&group_id=5470
Category: Unicode
Group: Platform-specific
Status: Open
Resolution: Postponed
Priority: 5
Submitted By: Jon Saenz (jsaenz)
Assigned to: M.-A. Lemburg (lemburg)
Summary: sizeof(Py_UNICODE)==2 ????
Initial Comment:
We are trying to install Python 2.0 in a Cray T3E.
After a painful process of removing several modules
which produce some errors (mmap, sha, md5), we get core
dumps when we run python because under this platform,
there does not exist a 16-bit numeric type. Unsigned
short is 4 bytes long.
We have finally defined unicode objects as unsigned
short, despite they are 4 bytes long, and we have
changed a sentence in
Objects/unicodeobject.c
to:
if (sizeof(Py_UNICODE)!=sizeof(unsigned short){
It compiles and runs now, but the test on regular
expressions crashes and the whole regression test does,
too.
Support of Unicode for this platform is not correct in
version 2.0 of Python.
----------------------------------------------------------------------
>Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-17 12:57
Message:
Logged In: YES
user_id=38388
The codecs are full of things like:
ch = ((s[0] & 0x0f) << 12) + ((s[1] & 0x3f) <<
6) + (s[2] & 0x3f);
if (ch < 0x800 || (ch >= 0xd800 && ch < 0xe000))
{
errmsg = "illegal encoding";
goto utf8Error;
}
where ch is a Py_UNICODE character.
The other "problem" is that pointer dereferencing is used a
lot in the code (using arrays of Py_UNICODE chars). We could
probably shift the calculations to Py_UCS4 integers and then
only do the data buffer access with Py_UNICODE which would
then be mapped to a a 2-char-array to get the data buffer
layout right.
Still, I think this is low priority. Patches are welcome of
course :-)
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-06-17 12:44
Message:
Logged In: YES
user_id=31435
Point me to one of the calculations that's thought to be a
problem, and happy to suggest something (I didn't find one
on my own, but I'm not familiar with the details here).
BTW, I reopened this because we got another report of T3E
woes on c.l.py that day.
You certainly need at least 16 bits, but it's hard to see
how having more than that could be a genuine problem -- at
worst "this kind of thing" usually requires no more than
masking with 0xffff at the end. That can be hidden in a
macro that's a nop on platforms that don't need it, if
micro-efficiency is a concern.
Often even that isn't needed. For example, binascii_crc32
absolutely must compute a 32-bit checksum, but works fine
on platforms with 8-byte longs. The only "trick" needed to
make that work was to compute the complement via
crc ^ 0xFFFFFFFFUL
instead of via
~crc
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-06-17 11:47
Message:
Logged In: YES
user_id=38388
It may be a design error, but getting this right for all
platforms is hard and by choosing the 16-bit type we managed
to handle 95% of all platforms in a fast and reliable way.
Any idea how we could "emulate" a 16-bit integer type ? We
need the integer type because we do calculcations on the
values.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-06-13 22:28
Message:
Logged In: YES
user_id=31435
I opened this again. It's simply unacceptable to require
that the platform have a 2-byte integer type. That doesn't
mean it's easy to fix, but it's a design error all the same.
----------------------------------------------------------------------
Comment By: M.-A. Lemburg (lemburg)
Date: 2001-03-16 11:27
Message:
Logged In: YES
user_id=38388
The current Unicode implementation needs Py_UNICODE to
be a 16-bit entity and so does SRE.
To get this to work on the Cray, you could try to use a
2-char
struct which is then cast to a short in all those places
which
assume a 16-bit number representation.
Simply using a 4-byte entity as basis will not work, since
the fact that Py_UNICODE fits into 2 bytes is hard-coded
into the implementation in a number of places.
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2001-03-01 15:29
Message:
Logged In: YES
user_id=31435
Notes:
+ Python was ported to T3E last year, IIRC by Marc Poinot.
May want to track him down.
+ Python's Unicode support doesn't rely on any platform
Unicode support. Whether it's "useless" depends on the
user, not the platform.
+ Face it <wink>: Crays are the only platforms that don't
have a native 16-bit integer type.
+ Even so, I believe at least SRE is happy to work with 32-
bit Unicode (glibc's wchar_t is 4 bytes, IIRC), so that
much was likely a shallow problem.
----------------------------------------------------------------------
Comment By: Jon Saenz (jsaenz)
Date: 2001-03-01 15:09
Message:
Logged In: YES
user_id=12122
We have finally given up to install Python in the Cray T3E
due to its lack of support of shared objects. This causes
difficulties in the loading of different external libraries
(Numeric, Lapack, and so on) because of the static linking.
In any case, we still think that this "bug" should be
repaired. There may be other platforms which:
1) Do not support Unicode, so that the Unicode feature of
Python is useless in these cases.
2) The users may be interested in using Python in them (for
Numeric applications, for instance)
3) May not have a 16-bit native numerical type.
Under these circunstances, the current version of Python can
not be used.
----------------------------------------------------------------------
Comment By: Jon Saenz (jsaenz)
Date: 2001-03-01 15:08
Message:
Logged In: YES
user_id=12122
We have finally given up to install Python in the Cray T3E
due to its lack of support of shared objects. This causes
difficulties in the loading of different external libraries
(Numeric, Lapack, and so on) because of the static linking.
In any case, we still think that this "bug" should be
repaired. There may be other platforms which:
1) Do not support Unicode, so that the Unicode feature of
Python is useless in these cases.
2) The users may be interested in using Python in them (for
Numeric applications, for instance)
3) May not have a 16-bit native numerical type.
Under these circunstances, the current version of Python can
not be used.
----------------------------------------------------------------------
Comment By: Jon Saenz (jsaenz)
Date: 2001-03-01 15:08
Message:
Logged In: YES
user_id=12122
We have finally given up to install Python in the Cray T3E
due to its lack of support of shared objects. This causes
difficulties in the loading of different external libraries
(Numeric, Lapack, and so on) because of the static linking.
In any case, we still think that this "bug" should be
repaired. There may be other platforms which:
1) Do not support Unicode, so that the Unicode feature of
Python is useless in these cases.
2) The users may be interested in using Python in them (for
Numeric applications, for instance)
3) May not have a 16-bit native numerical type.
Under these circunstances, the current version of Python can
not be used.
----------------------------------------------------------------------
Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2001-03-01 14:05
Message:
Logged In: YES
user_id=3066
Marc-Andre, can you deal with the general Unicode issues here and then pass this along to Fredrik for SRE updates? (Or better yet, work in parallel?)
Thanks!
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=405227&group_id=5470