[Python-Dev] methods on the bytes object

"Martin v. Löwis" martin at v.loewis.de
Mon May 1 21:24:30 CEST 2006


Josiah Carlson wrote:
>>> Certainly that is the case.  But how would you propose embedded bytes
>>> data be represented? (I talk more extensively about this particular
>>> issue later).
>> Can't answer: I don't know what "embedded bytes data" are.

Ok. I think I would use base64, of possibly compressed content. It's
more compact than your representation, as it only uses 1.3 characters
per byte, instead of the up-to-four bytes that the img2py uses.

If ease-of-porting is an issue, img2py should just put an
.encode("latin-1") at the end of the string.

>     return zlib.decompress(
> 'x\xda\x01\x14\x02\xeb\xfd\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00 \
[...]

> That data is non-textual.  It is bytes within a string literal.  And it
> is embedded (within a .py file).

In Python 2.x, it is that, yes. In Python 3, it is a (meaningless)
text.

>>> I am apparently not communicating this particular idea effectively
>>> enough.  How would you propose that I store parsing literals for
>>> non-textual data, and how would you propose that I set up a dictionary
>>> to hold some non-trivial number of these parsing literals?
> 
> An operationX literal is a symbol that describes how to interpret the
> subsequent or previous data.  For an example of this, see the pickle
> module (portions of which I include below).

I don't think there can be, or should be, a general solution for
all operationX literals, because the different applications of
operationX all have different requirements wrt. their literals.

In binary data, integers are the most obvious choice for
operationX literals. In text data, string literals are.

> I described before how you would use this kind of thing to perform
> operationX on structured information.  It turns out that pickle (in
> Python) uses a dictionary of operationX symbols/literals -> unbound
> instance methods to perform operationX on the pickled representation of
> Python objects (literals where XXXX = '...' are defined, and symbols
> using the XXXX names). The relevant code for unpickling is the while 1:
> section of the following.

Right. I would convert the top of pickle.py to read

MARK            = ord('(')
STOP            = ord('.')
...

> 
>     def load(self):
>         """Read a pickled object representation from the open file.
> 
>         Return the reconstituted object hierarchy specified in the file.
>         """
>         self.mark = object() # any new unique object
>         self.stack = []
>         self.append = self.stack.append
>         read = self.read
>         dispatch = self.dispatch
>         try:
>             while 1:
>                 key = read(1)

and then this to
                  key = ord(read(1))

>                 dispatch[key](self)
>         except _Stop, stopinst:
>             return stopinst.value

> For an example of where people use '...' to represent non-textual
> information in a literal, see the '# Protocol 2' section of pickle.py ...

Right.

> # Protocol 2
> 
> PROTO           = '\x80'  # identify pickle protocol

This should be changed to

PROTO           = 0x80  # identify pickle protocol
etc.

> The point of this example was to show that operationX isn't necessarily
> the processing of text, but may in fact be the interpretation of binary
> data. It was also supposed to show how one may need to define symbols
> for such interpretation via literals of some kind.  In the pickle module,
> this is done in two parts: XXX = <literal>; dispatch[XXX] = fcn.  I've
> also seen it as dispatch = {<literal>: fcn}

Yes. For pickle, the ordinals of the type code make good operationX
literals.

> See any line-based socket protocol for where .find() is useful.

Any line-based protocol is textual, usually based on ASCII.

Regards,
Martin



More information about the Python-Dev mailing list