[Python-Dev] PEP 393 Summer of Code Project

Victor Stinner victor.stinner at haypocalc.com
Wed Aug 24 10:27:21 CEST 2011


Le 24/08/2011 04:56, Torsten Becker a écrit :
> On Tue, Aug 23, 2011 at 18:56, Victor Stinner
> <victor.stinner at haypocalc.com>  wrote:
>>> kind=0 is used and public, it's PyUnicode_WCHAR_KIND. Is it still
>>> necessary? It looks to be only used in PyUnicode_DecodeUnicodeEscape().
>>
>> If it can be removed, it would be nice to have kind in [0; 2] instead of kind
>> in [1; 2], to be able to have a list (of 3 items) =>  callback or label.
>
> It is also used in PyUnicode_DecodeUTF8Stateful() and there might be
> some cases which I missed converting checks for 0 when I introduced
> the macro.  The question was more if this should be written as 0 or as
> a named constant.  I preferred the named constant for readability.
>
> An alternative would be to have kind values be the same as the number
> of bytes for the string representation so it would be 0 (wstr), 1
> (1-byte), 2 (2-byte), or 4 (4-byte).

Please don't do that: it's more common to need contiguous arrays (for a 
jump table/callback list) than having to know the character size. You 
can use an array giving the character size: CHARACTER_SIZE[kind] which 
is the array {0, 1, 2, 4} (or maybe sizeof(wchar_t) instead of 0 ?).

> I think the value for wstr/uninitialized/reserved should not be
> removed.  The wstr representation is still used in the error case in
> the utf8 decoder because these strings can be resized.

In Python, you can resize an object if it has only one reference. Why is 
it not possible in your branch?

Oh, I missed the UTF-8 decoder because you wrote "kind = 0": please, use 
PyUnicode_WCHAR_KIND instead!

I don't like "reserved" value, especially if its value is 0, the first 
value. See Microsoft file formats: they waste a lot of space because 
most fields are reserved, and 10 years later, these fields are still 
unused. Can't we add the value 4 when we will need a new kind?

> Also having one
> designated value for "uninitialized" limits comparisons in the
> affected functions to the kind value, otherwise they would need to
> check the str field for NULL to determine in which buffer to write a
> character.

I have to read the code more carefully, I don't know this 
"uninitialized" state.

For kind=0: "wstr" means that str is NULL but wstr is set? I didn't 
understand that str can be NULL for an initialized string. I should read 
the PEP again :-)

>> I suppose that compilers prefer a switch with all cases defined, 0 a first item
>> and contiguous values. We may need an enum.
>
> During the Summer of Code, Martin and I did a experiment with GCC and
> it did not seem to produce a jump table as an optimization for three
> cases but generated comparison instructions anyway.

You mean with a switch with a case for each possible value? I don't 
think that GCC knows that all cases are defined if you don't use an enum.

> I am not sure how much we should optimize for potential compiler
 > optimizations here.

Oh, it was just a suggestion. Sure, it's not the best moment to care of 
micro-optimizations.

Victor


More information about the Python-Dev mailing list