[Patches] [ python-Patches-520694 ] arraymodule.c improvements

Wed, 20 Feb 2002 17:15:10 -0800

Patches item #520694, was opened at 2002-02-20 14:38
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470

Category: None
Group: None
Status: Open
Resolution: None
Priority: 3
Submitted By: Jason Orendorff (jorend)
Assigned to: Nobody/Anonymous (nobody)
Summary: arraymodule.c improvements

Initial Comment:
This patch makes brings the array module a little
more up-to-date.

There are two changes:

1. Modernize the array type, memory management,
   and so forth.  As a result, the array()
   builtin is no longer a function but a type.
   array.array is array.ArrayType.
   Also, it can now be subclassed in Python.

2. Add a new typecode 'u', for Unicode
   characters.

The patch includes changes to test/test_array.py
to test the new features.

I would like to make a further change: add an
arrayobject.h include file, and provide some
array operations there, giving them names like
PyArray_Check(), PyArray_GetItem(), and
PyArray_GET_DATA().  Is such a change likely
to find favor?

----------------------------------------------------------------------

>Comment By: Jason Orendorff (jorend)
Date: 2002-02-20 17:15

Message:
Logged In: YES 
user_id=18139

> I don't like the Unicode part of it at all.

Well, I'm not attatched to it.  It's very easy
to subtract it from the patch.

> What can you do with this feature?

The same sort of thing you might do with an array
of type 'c'.  For example, change individual
characters of a (Unicode) string and then run a
(Unicode) re.match on it.

> It seems to unfairly prefer a specific Unicode encoding,
> without explaining what that encoding is, and without a
> clear use case why this encoding is desirable.

Well, why should array('h', '\x00\xff\xaa\xbb')
be allowed?  Why is that encoding preferable to any
other particular encoding of short ints?  Easy:
it's the encoding of the C compiler where Python was
built.  For 'u' arrays, the encoding used is just the
encoding that Python uses internally.

However, it's not intended to be used in any situation
where encode()/decode() would be appropriate.  I never
even thought about that possibility when I wrote it.

The behavior of a 'u' array is intended to be more
like this:  Suppose A = array('u', ustr).  Then:
    len(A) == len(ustr)
    A[0] == ustr[0]
    A[1] == ustr[1]
    ...

That is, a 'u' array is an array of Unicode characters.
Encoding is not an issue, any more than with the
built-in unicode type.

(If ustr is a non-Unicode string, then the behavior
is different -- more in line with what 'b', 'h', 'i',
and the others do.)

If your concern is that Python currently "hides" its
internal encoding, and the 'u' array exposes this
unnecessarily, then consider these two examples that
don't involve arrays:

>>> x = u'\U00012345'  # One Unicode codepoint...
>>> len(x)
2             # hmm.
>>> x[0]
u'\ud808'     # aha.  UTF-16.
>>> x[1]
u'\udf45'

>>> str(buffer(u'abc'))   # Example two.
'a\x00b\x00c\x00'

> It also seems to overlap with the Unicode object's
> .encode method, which is much more general.

Wow.  Well, that wasn't my intent.

It is intended, rather, to offer parity with 'c'.
Java has byte[], short[], int[], long[], float[],
double[], and char[]... Python doesn't currently have
char[].  Shouldn't it?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-02-20 15:02

Message:
Logged In: YES 
user_id=21627

What is the rationale for expanding PyObject_VAR_HEAD? It
doesn't seem to achieve anything.

I don't like the Unicode part of it at all. What can you do
with this feature? It seems to unfairly prefer a specific
Unicode encoding, without explaining what that encoding is,
and without a clear use case why this encoding is desirable.
It also seems to overlap with the Unicode object's .encode
method, which is much more general.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=520694&group_id=5470