Undeterministic strxfrm?

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Tue Sep 4 23:45:09 EDT 2007


En Tue, 04 Sep 2007 22:18:05 -0300, Tuomas <tuomas.vesterinen at pp.inet.fi>  
escribi�:

> Peter Otten wrote:
>> Python seems to be the culprit as there is a relatively recent
>> strxfrm-related bugfix, see
>
> Thanks Peter. Can't find it, do you have the issue number?

I think it's not in the issue tracker - see  
http://xforce.iss.net/xforce/xfdb/34060
The fix is already in 2.5.1  
http://www.python.org/download/releases/2.5.1/NEWS.txt

> Reading the rev 54669 it seems to me, that the bug is not fixed. Man  
> says:
>
> STRXFRM(3): ... size_t strxfrm(char *dest, const char *src, size_t n);
> ... The first n characters of  the  transformed  string
> are  placed in dest.  The transformation is based on the program’s
> current locale for category LC_COLLATE.
> ... The strxfrm() function returns the number of bytes required to
> store  the transformed  string  in dest excluding the terminating ‘\0’
> character.  If the value returned is n or more, the contents of dest are
> *indeterminate*.
>
> Accordin the man pages Python should know the size of the result it
> expects and don't trust the size strxfrm returns. I don't completely
> understand the collate algorithm, but it should offer different levels
> of collate. So Python too, should offer those levels as a second
> parameter. Hovever strxfrm don't offer more parameters either except
> there is another function strcasecmp. So Python should be able to
> calculate the expected size before calling strxfrm or strcasecmp. I
> don't how it is possible. May be strcoll knows better and I should kick
> strxfrm off and take strcoll instead. It costs converting the seach key
> in every step of the search.

No. That's why strxfrm is called twice: the first one returns the required  
buffer size, the buffer is resized, and strxfrm is called again. That's a  
rather common sequence when buffer sizes are not known in advance.
[Note that `dest` is indeterminate, NOT the function return value which  
always returns the required buffer size]

-- 
Gabriel Genellina




More information about the Python-list mailing list