Dumb python questions

Bengt Richter bokr at accessone.com
Wed Aug 22 18:53:05 EDT 2001


On Wed, 22 Aug 2001 02:16:21 -0400, "Tim Peters" <tim.one at home.com> wrote:

>[Bengt Richter, on the marshal format for longs]
>> ...
>>  (a little-endian string made from the internal int chunk count
>> and the following n chunks of two bytes each holding a little-endian
>> short 15-bit number)
>> ...
>> BTW, the marshaled long format is apparently platform-dependent,
>
>That's not apparent to me <wink>, and it's intended to be
>platform-independent.  marshal is what Python uses internally to create .pyc
>and .pyo files, and they're routinely transported across platforms.
>
>"15" is a constant across all platforms, and endianess is forced by
>reading/writing one byte at a time.
>
Ok, I misspoke ;-). (Especially since the very purpose of 'marshaling' is to generate
a platform-independent representation *<;-)

I should have said, "might be version dependent." The format just struck me as
very implementation-specific. As it turns out, on my little endian machine, it's
a straight image of the data of the long data representation, prefixed with a type
identifier character 'l'.

I'd guess 15 was chosen so that unsigned addition could be done in C without overflow,
and not all platforms could handle 31 (or products of 31). But sheesh, a little assembler
code would get you faster arithmetic and the possibility of a more platform independent,
packed format (e.g., type code + 32-bit count of bytes following + the n bytes).
The same format would then be usable in 16 or 32 (and maybe even 64 for add) bit chunks.

Of course, one volunteer can write the C, and it would take one per platform to write
assembler ;-/

OTOH, one intel32 volunteer doing assembly for GNU and MS tools could benefit a lot of
folks. If the marshal format were packed for compilation purposes, other platforms would
take the hit importing it into their internal long representations, and their interpreters
would have to have that capability. Maybe type code 'L' instead of 'l' ? (Or more generally,
see below...)

Hm, I wonder how much compiled code gets generated on one platform and executed on another
as a percentage of the total. Every time anyone does anything interactively, it's all local
pretty much 99+% I would think.

I'm leading to the notion of local marshalling for code to be used efficiently on the same
machine, vs. a generic export marshalling hub format. (I.e., hub in the sense that all formats
should be translatable to and from the hub format, but not necessarily directly to each other).

The idea would be that a system would recognize a foreign format and convert it for local
use, and that the hub format would be used for general distribution, when people wanted to
be politically correct.

Again, the concept behind the concept is a unified idea with multiple possible representations.
Marshalled representations could have an optional flavor prefix, starting with a character
not used for current types, e.g., 'M' for special-flavor marshaling, followed if present with
one or two characters to identify the flavor. Each platform could then easily recognize its own
flavor and presumbably do a memmove to stuff a simple object, instead of doing endian-compensating
serial packing one chunk's worth of bytes at a time.

If the flavor was not local, it wouldn't hurt, it would just slow things down, until a local
version got cached or written to file.





More information about the Python-list mailing list