Proposed new PEP: print to expand generators

Sat Jun 3 21:36:14 EDT 2006

I would like to champion a proposed enhancement to Python.  I describe the 
basic idea below, in order to gage community interest.  Right now, it's only 
an idea, and I'm sure there's room for improvement.  And of course it's 
possible there's some serious "gotcha" I've overlooked.  Thus I welcome any 
and all comments.

If there's some agreement that this proposal is worth further consideration 
then I'll re-submit a formal document in official PEP format.

Regards

--jb

PEP -- EXTEND PRINT TO EXPAND GENERATORS

NUTSHELL

I propose that we extend the semantics of "print" such that if the object to 
be printed is a generator then print would iterate over the resulting 
sequence of sub-objects and recursively print each of the items in order.

E.g.,

	print obj

under the proposal would behave something like

	import types

	if type( obj ) == types.GeneratorType:
		for item in obj:
			print item,	# recursive call
		print			# trailing newline
	else:
		print obj		# existing  print behavior

I know this isn't precisely how print would work, but I intentionally 
simplified the illustration to emphasize the intended change.  Nevertheless, 
several points above expressly are part of this proposal (subject to 
discussion and possible revision):

	Print behavior does not change EXCEPT in the case
	that the object being printed is a generator.

	Enumerated items are printed with intervening spaces
	[alternatively: "" or "\n"].

	An enumerated sequence ends with a newline
	[alternatively: "" or " "].

Iterators themselves could return iterators as elements, and the proposed 
change to print would recursively serialize any arbitrary "tree" of iterators.

__str__() for complex user-defined objects then could return iterators, and 
arbitrarily complex structures could be printed out without glomming 
everything into a huge string -- only to throw it away in the end.

I expect we likely also would want to modify str() itself to embody this 
serialization behavior.  This additional change would support those cases 
where one actually does want the single large string in the end, say, to 
store into a UI widget.  Still, the string would be constructed once at the 
end, much more efficiently than by building a bunch of smaller, intermediate 
strings.

Then, in an abstract sense, we would not  be changing print at all -- the new 
semantics would be embodied in the change to str().  However, in practice, 
we'd also want to modify print, as an important optimization for a more 
common use case.

The present behavior (displaying, e.g., "<generator object at 0x016BA288>") 
would still be available via

	print repr( generator )

Note that this behavior presently results from all three of:

	print generator
	print str( generator )
	print repr( generator )

So, this proposal merely ascribes useful new semantics to the first two of 
three redundant language constructs.

MOTIVATION

With increasingly complex objects, the print representation naturally becomes 
more complex.  In particular, when an object consists of a collection of 
sub-objects, it's natural for it's string representation to be defined 
recursively in terms of the sub-components' string representations, with some 
further indication of how they're held together.

This is possible to do with the __str__ overload and the existing print 
semantics.  However, existing semantics require constructing many otherwise 
unnecessary intermediate strings, and, as such, is grossly inefficient. 
Worse, each intermediate string is generally the catenation of several 
previous intermediaries, so the volume of intermediate results steadily 
increases throughout the conversion.  Finally, the cost of string operations 
is proportional to the length of the strings in question, so I expect the 
overall cost increases significantly faster than in direct proportion to the 
size of the output (i.e. it's non-linear).

E.g., instances of the following classes can become arbitrarily expensive to 
print out:

	def HtmlTable( object ):
		# ...
	def __str__( self ):
		return ( "<table"
			+ str( self.attr )
			+ ">\n"
			+ "".join([ str( row ) for row in self.head ])
			+ "".join([ str( row ) for row in self.rows ])
			+ "</table>\n" )

	def HtmlRow( object ):
		# ...
	def __str__( self ):
		return ( "<tr"
			+ str( self.attr )
			+ ">\n"
			+ "".join([ str( cell ) for cell in self.cells ])
			+ "</tr>\n" )

	def HtmlCell( object ):
		# ...
	def __str__( self ):
		return ( "<td"
			+ str( self.attr )
			+ ">\n"
			+ "".join([ str( datum ) for datum in self.data ])
			+ "</td>\n" )

Clearly, printing an arbitrary HtmlTable might require a LOT of unnecessary 
string manipulation.

Using the proposed extension, the above example could be implemented instead 
as something like:

	def HtmlTable( object ):
		# ...
	def __str__( self ):
		yield "<table"
		yield str( self.attr )
		yield ">\n"
		for row in self.head:
			yield str( row )
		for row in self.rows:
			yield str( row )
		yield "</table>\n"

	def HtmlRow( object ):
		# ...
	def __str__( self ):
		yield "<tr"
		yield str( self.attr )
		yield ">\n"
		for cell in self.cells:
			yield str( cell )
		yield "</tr>\n"

	def HtmlCell( object ):
		# ...
	def __str__( self ):
		yield "<td"
		yield str( self.attr )
		yield ">\n"
		for datum in self.data:
			yield str( datum )
		yield "</td>\n"

With the new extension, the individual bits of data are simply output in the 
proper order, virtually eliminating unnecessary string operations, resulting 
in a huge performance improvement.  In fact, in the common case where all of 
the leaf nodes are literal strings, then the entire HTML table (or page!) 
could be written out without any string manipulation -- the existing strings 
are simply written out from their present locations in memory!

Furthermore, there's greater clarity and economy of expression in the 
proposed new method.

The primary motivation behind this proposal is to eliminate unnecessary 
overhead, while retaining all the convenience of the existing semantics of 
string representations of custom objects.

While it's not 100% backwards compatible, it assigns a new meaning to one of 
several redundant and little-used, existing language constructs.

ALTERNATIVES

In lieu of the proposed change, users can define their own auxiliary function 
to generate the output.  E.g.:

	def HtmlTable( object ):
		# ...
	def pr( self, stream=sys.stdout ):
		"<table"
		print >>stream, str( self.attr )
		print >>stream, ">\n"
		for row in self.head:
			print >>stream, row
		row in self.rows:
			print >>stream, row
		print >>stream, "</table>"

I myself have successfully used this technique in a variety of applications.

Pro:
	Requires no changes to Python

Con:
	The solution has to be "hand crafted" in each case,
	subject to user errors.

	The solution only works if user expressly maintains the
	convention throughout his class hierarchy.

	The solution is not interchangeable with objects
	from other authors.

///