python coding contest

Claudio Grondi claudio.grondi at freenet.de
Sun Jan 1 14:39:22 EST 2006


Claudio Grondi wrote:
> Steven D'Aprano wrote:
> 
>> On Sun, 01 Jan 2006 15:49:58 +0100, Claudio Grondi wrote:
>>
>>
>>
>>> What I have thought about as a simpler/better solution is a method 
>>> allowing to avoid processing the content of the string or long 
>>> integer object by looping over its content. 
>>
>>
>>
>> How can you avoid looping over its content? Whether you do it yourself
>> using "for byte in array" or similar, or Python does it for you
>> (using array.tostring perhaps), *something* has to walk through the 
>> bytes.
>>
>> If you don't like walking the string, write a function to do it once, and
>> then use the function.
>>
>>
>>> I suppose, that knowing enough about Python internals it must be 
>>> possible to change only the object type not beeing forced to process 
>>> the content i.e. the value itself, what in case of big size of data 
>>> to convert with methods like this above wastes CPU time.
>>
>>
>>
>> I'm reminded of a time I was going for a drive in the country when I 
>> drove
>> past an apple orchid. Standing in the orchid was a farmer with a pig. He
>> lifted the pig into the air, and the pig then bit an apple and slowly
>> chewed it. The farmer then carried him over to another branch, and the 
>> pig
>> ate another apple.
>>
>> I was so surprised I stopped my car and wandered over to ask the farmer
>> what he was doing.
>>
>> "I'm feeding apples to my pig," he replied.
>>
>> "Wouldn't it save time to just pick some apples and feed them to the 
>> pig?"
>>
>> The farmer looked at me like I was an idiot. "What's time to a pig?"
>>
>>
>> The moral of the story is, before spending time working on some scheme to
>> save CPU time, you better be absolutely sure that firstly, you are going
>> to save CPU time, secondly, that it is enough CPU time to be worth 
>> saving,
>> and thirdly, that you aren't wasting more of your own time to do it.
>>
>>
>>
> It's a funny story :-))
> 
> , but in my eyes not appropriate in given context, because my prior goal 
> is to understand some more about Python internals, i.e. what is and if 
> it is at all a differerence between the internal representation of a 
> string and a long integer.
> 
> I know, I should probably look into the C source of Python, but I 
> suppose it could be too hard for me to find the appropriate piece of 
> code, so I welcome any hints.
> 
> If I knew the internal representation of string and long integer objects 
> and were able to read/write to memory and point an identifier at a given 
> memory address, a conversion between long integer and string types were 
> probably nothing else as changing some bytes and repointing an 
> identifier (assuming that string and long integer values are in memory 
> the same data if they represent the same value). That it would save CPU 
> time is secondary here, but with increasing costs of energy making the 
> number on my electrical power bill higher each year due to higher power 
> consumption with increasing number of programs I run (it makes a 
> difference of 50 Watt between an algorithm keeping the CPU 100% busy and 
> an algorithm using only 1% of it) it is not necessarily paranoia driving 
> one to consider potential savings of CPU time.
> In this context the example of the bigdec class comes to my mind, where 
> usage of another algorithm made it possible to cut down power 
> consumption and time of printing a decimal form of the largest known 
> prime number from 7 hours of a 100% busy CPU down to 7 seconds!
> 
> Claudio

 From stringobject.h :
typedef struct {
     PyObject_VAR_HEAD
     long ob_shash;
     int ob_sstate;
     char ob_sval[1];
     /* Invariants:
      *     ob_sval contains space for 'ob_size+1' elements.
      *     ob_sval[ob_size] == 0.
      *     ob_shash is the hash of the string or -1 if not computed yet.
      *     ob_sstate != 0 iff the string object is in stringobject.c's
      *       'interned' dictionary; in this case the two references
      *       from 'interned' to this object are *not counted* in ob_refcnt.
      */
} PyStringObject;

 From longobject.h :
typedef struct _longobject PyLongObject; /* Revealed in longintrepr.h */
 From longintrepr.h :
typedef unsigned short digit;
struct _longobject {
  PyObject_VAR_HEAD
  digit ob_digit[1];
};

 From this I mean to see, that the string data header is longer 
containing an additional long and an int compared to long integer data 
header. The long integer seem to be an array of unsigned short values 
and the string an array of characters. My MSDN help tells me, that:
"Type short int (or simply short) is an integral type that is larger 
than or equal to the size of type char, and shorter than or equal to the 
size of type int." In Microsoft Visual C++ short is 2 bytes long, and 
because I am on an Intel processor the sequence of bytes will differ 
from what I will get when creating a binary string out of a long integer 
due to swapping of byte order. Am I right here?

My conclusion is, that beeing on e.g. Motorola 68000 based processors 
the actual data behind the string and the long integer types will be the 
same, but beeing on an Intel 386 series processor based systems the 
actual data stored in memory behind a string and long integer will differ.

So finally the idea of converting binary strings forth and back to long 
integers without looping over the data content does for sure not work on 
Intel processor based systems, but may work on processors which don't 
need swapped bytes for integer operations ( I found this page 
http://www.cs.princeton.edu/~kazad/resources/cs/endian.htm gives a good 
overview about the problems with byte swapping, is there any better 
reference related to this subject online? ).

As Python tries to be multiplatform, I don't understand why it stores 
long integers as an array of shorts and not as an array of chars? 
Historical reasons? Speed reasons? Any other reasons?

Wouldn't it be more intuitive to have in Python a data type beeing an 
object with binary data which can be easily 'casted' to both string and 
long integer depending how it should be processed or used?
Do you see any advantage to have such data type, or am I quite alone 
with this idea?

Claudio



More information about the Python-list mailing list