[IPython-dev] Extensible pretty-printing

Thu Oct 28 20:13:25 EDT 2010

On 10/28/10 6:11 PM, Fernando Perez wrote:
> On Thu, Oct 28, 2010 at 4:00 PM, Robert Kern<robert.kern at gmail.com>  wrote:
>> It's fresh. Also note that we have local modifications not in the upstream to
>> support the registration of prettyprinters by the name of the type to avoid imports.
>
> OK.  Probably would be a good idea to make a little note in the file
> indicating this.
>
>>> One last question: we don't want anything actually *printing*, instead
>>> we want an interface that *returns* strings which we'll  stuff on the
>>> pyout channel (an in-process version can simply take these values and
>>> print them, of course).
>>
>> pretty has a pformat()-equivalent. The original pull request had already made
>> that change.
>
> OK.
>
>>> Right now we only have a single
>>> representation stored in the 'data' field.  How do you think we should
>>> go about the multi-field option, within the  context of pretty?
>>
>> pretty does not solve that problem.
>>
>> I recommend exactly what I did in ipwx. The DisplayTrap is configured with a
>> list of DisplayFormatters. Each DisplayFormatter gets a chance to decorate the
>> return messaged with an additional entry, keyed by the type of the
>> DisplayFormatter (probably something like 'string', 'html', 'image', etc. but
>> also perhaps 'repr', 'pretty', 'mathtext'; needs some more thought). pretty
>> would just be the implementation of the default string DisplayFormatter.
>
> OK, so how do you want to proceed: do you want to reopen your pull
> request (possibly rebasing it if necessary) as it was, or do you want
> to go ahead and implement the above approach right away?

I'd rather implement this approach right away. We just need to decide what the 
keys should be and what they should mean. I originally used the ID of the 
DisplayFormatter. This would allow both a "normal" representation and an 
enhanced one both of the same type (plain text, HTML, PNG image) to coexist. 
Then the frontend could pick which one to display and let the user flip back and 
forth as desired even for old Out[] entries without reexecuting code. This may 
be a case of YAGNI.

However, that means that the frontend needs to know about the IDs of the 
DisplayFormatters. It needs to know that 'my-tweaked-html' formatter is HTML. I 
might propose this as the fully-general solution:

Each DisplayFormatter has a unique ID and a non-unique type. The type string 
determines how a frontend would actually interpret the data for display. If a 
frontend can display a particular type, it can display it for any 
DisplayFormatter of that type. There will be a few predefined type strings with 
meanings, but implementors can define new ones as long as they pick new names.

   text -- monospaced plain text (unicode)
   html -- snippet of HTML (anything one can slap inside of a <div>)
   image -- bytes of an image file (anything loadable by PIL, so no need to have 
different PNG and JPEG type strings)
   mathtext -- just the TeX-lite text (the frontend can render it itself)

When given an object for display, the DisplayHook will give it to each of the 
DisplayFormatters in turn. If the formatter can handle the object, it will 
return some JSONable object[1]. The DisplayHook will append a 3-tuple

   (formatter.id, formatter.type, data)

to a list. The DisplayHook will give this to whatever is forming the response 
message.

Most likely, there won't be too many of these formatters for the same type 
active at any time and there should always be the (id='default', type='text') 
formatter. A simple frontend can just look for that. A more complicated GUI 
frontend may prefer a type='html' response and only fall back to a type='text' 
format. It may have an ordered list of formatter IDs that it will try to display 
before falling back in order. It might allow the user to flip through the 
different representations for each cell. For example, if I have a 
type='mathtext' formatter showing sympy expressions, I might wish to go back to 
a simple repr so I know what to type to reproduce the expression.

I'm certain this is overengineered, but I think we have use cases for all of the 
features in it. I think most of the complexity is optional. The basic in-process 
terminal frontend doesn't even bother with most of this and just uses the 
default formatter to get the text and prints it.

[1] Why a general JSONable object instead of just bytes? It would be nice to be 
able to define a formatter that could give some structured information about the 
object. For example, we could define an ArrayMetadataFormatter that gives a dict 
with shape, dtype, etc. A GUI frontend could display this information nicely 
formatted along with one of the other representations.

> If the latter, I'm not sure I like the approach of passing a dict
> through and letting each formatter modify it.  Sate that mutates
> as-it-goes tends to produce harder to understand code, at least in my
> experience.  Instead, we can call all the formatters in sequence and
> get from each a pair of key, value.  We can then insert the keys into
> a dict as they come on our side (so if the storage structure ever
> changes from a dict to anything else, likely the formatters can stay
> unmodified).  Does that sound reasonable to you?

That's actually how I would have implemented it [my original ipwx code 
notwithstanding ;-)].

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco