[Patches] [Patch #101544] Fix for 110609: Allow large long integers in formatting

noreply@sourceforge.net noreply@sourceforge.net
Wed, 20 Sep 2000 22:24:26 -0700


Patch #101544 has been updated. 

Project: 
Category: core (C code)
Status: Rejected
Summary: Fix for 110609: Allow large long integers in formatting

Follow-Ups:

Date: 2000-Sep-18 21:03
By: gvanrossum

Comment:
For Tim, because he likes numerical tweakage.
(Tim, you may also give this to Marc-Andre, since it affects Unicode.)
-------------------------------------------------------

Date: 2000-Sep-19 21:06
By: tim_one

Comment:
I'm afraid there are a lot of low-level problems with this patch.  Have spent about 90 minutes fixing problems so far, and think it will take another 90 before it's done.  Since I'm in the middle of it now, I'd rather finish it than start over from scratch with another stab (and I *do* want to see this get in -- it's a frequent complaint on c.l.py, and is sure *perceived* as a bug today).

Examples include:

+ There was no prototype in scope for stringobject.c's new Py_formatlong when it was referenced in unicodeobject.c.  This caused the compiler to assume it returned int instead of PyObject*, and that caused bad code generation.

+ "Py_formatlong" isn't a proper name for an internal function; if it was meant to be part of the public API, it needed docs.

+ The new formatlong in unicodeobject.c should have been declared static.

+ Ubiquitous use of 0 instead of NULL for null pointers; but this is C, not C++.

+ Non-Guido-like code formatting in small ways.

+ The logic for "skip" and "sign" is hosed in some cases.  For example,

>>> "%20x" % 2L**31
'            80000000'
>>> "%.20x" % 2L**31
'8000000000000000000'
>>>

This because the first "prec > len" loop copies skip+sign characters from the start of buf regardless of whether F_ALT is set (in the 2nd case above, F_ALT is not set, so it was incorrect to copy 2 characters before zero-filling).

+ The changed test_format failed for me; haven't tracked down why yet.

+ Not your doing:  stringobject.c's existing formatint uses a too-small buffer, under the bad assumption that a C long is at most 2**31.  (Speaking of which, why do people constantly try to get a temp buffer down to as few bytes as possible?!  The difference between 20 and (say) 256 in a leaf routine is meaningless.)

That last one was just to reassure you that I didn't spend the *whole* 90 minutes so far just picking on this patch <wink>.
-------------------------------------------------------

Date: 2000-Sep-20 00:42
By: loewis

Comment:
It appears that I forgot the snippet

diff -u -r2.23 stringobject.h
--- stringobject.h      2000/09/19 20:58:38     2.23
+++ stringobject.h      2000/09/20 07:32:28
@@ -68,6 +68,10 @@
 #define PyString_InternFromString(cp) PyString_FromString(cp)
 #endif
 
+/* For use in unicodeobject only. */
+PyObject* Py_formatlong(PyObject *val, int flags, int prec, int type, 
+                       char**pbuf, int *plen);
+
 /* Macro, trading safety for speed */
 #define PyString_AS_STRING(op) (((PyStringObject *)(op))->ob_sval)
 #define PyString_GET_SIZE(op)  (((PyStringObject *)(op))->ob_size)

IOW, Py_formatlong is not intended as an external API. It must have a
prefixed name, though, as it is intended to be used across translation units.

As for 0 vs NULL, ISO C99 says "An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant." It says that NULL is an implementation-defined null pointer constant, it does not say that either is better than the other. So I feel usage of 0 is perfectly fine in C, with the same rationale that encourages it in C++ over usage of NULL.

I'm sorry for the other problems that patch caused.
-------------------------------------------------------

Date: 2000-Sep-20 00:49
By: loewis

Comment:
It also appears that the last chunk of test_format is incorrect, 10 octal is 12, and 10 decimal is 10...
-------------------------------------------------------

Date: 2000-Sep-20 17:49
By: tim_one

Comment:
Martin, these aren't deep issues,  This is simply a matter of playing along with the code all around you.  The convention for private interface names is to begin them with an underscore, as is plain just from looking at stringobject.c alone.  So I renamed the function to _PyString_FormatLong, so it looks like everything else.  Same bit with NULL vs 0:  when the code you're changing uses NULL *everywhere*, using 0 instead for your own little pieces is a bad idea, and for exactly the same reasons a mixture of indentation or brace styles is a bad idea.  There's nothing that can be argued about here.

As for the rest, I think you underestimated the intricacy of C's format codes.  As other examples here, the blank, plus and zero flags are getting ignored now when formatting a long.  This isn't trivial to fix, as the code you built on believed to the depths of its soul that x, X and o conversions can't yield a string with a sign character.

So I'm fixing that stuff too.  It's all doable, just tedious (I admire your courage in daring to touch any of this code at all <0.9 wink>).

I agree with your analysis of the test_format failures, and have fixed them.  No offense intended, but it's more evidence that you were in an un-Martin-like careless mood when you created this patch (test_format could not have worked for you either).

-------------------------------------------------------

Date: 2000-Sep-20 22:24
By: tim_one

Comment:
Rejected this specific patch for the reasons given, but am checking in a derivative that I hope gets all the endcases right; also added dozens of non-trivial new test cases to test_format.py.
-------------------------------------------------------

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=101544&group_id=5470