Writev

Adam DePrince adam at cognitcorp.com
Mon Dec 20 01:05:50 EST 2004


On Mon, 2004-12-20 at 00:30, Steven Bethard wrote:
> Adam DePrince wrote:
> > Many other programmers have faced a similar issue; cStringIO,
> > ''.join([mydata]), map( file.write, [mydata]) are but some attempts at
> > making this process more efficient by jamming the components to be
> > written into a sequence.
> 
> I'm obviously misunderstanding something because I can't figure out why 
> you would write:
> 
>      map(file.write, [mydata])
> 
> instead of
> 
>      file.write(mydata)

No, you misunderstand.  mydata is a metavariable symbolic for a long
list of things that would be in that list.  

map( file.write, mydata ) where mydata is some really long list or
iterator.


> 
> Is your task to write a sequence/iterator of items into a file?  I would 
> expect your example to look like:
> 
>      map(file.write, mydata)
> 
> which I would write as:
> 
>      file.writelines(mydata)
> 
> Could you explain a little more what your intent is here?

file.writelines( seq ) and map( file.write, seq ) are the same; the
former is syntactic sugar for the later.

Writev is a neat abstraction in posix that allows the operating system
to handle the gathering of data to be written instead of the application
just in-case something in the hardware is smart enough to support
scatter-gather I/O.  With a dumb I/O device, either the user application
(write) or the OS (writev) has the chore of gathering the data to be
writen, concatenating it and sending it on its way.  A lot of devices
are smart enough to be handled a list of pointers and lengths and be
told to "write this" to the device.  In a sense, writev is a pretty
close approximation to the API provided by a lot of high end disk and
network controllers to the OS. 

The benefit is that you don't have to choose between these two evils:

1) Copying your strings so they are all in a line
2) Context switching to your OS a lot.

Let us consider this straw man; the sequence that would be generated by:

def numbers():
	for x in range( 10000000 ):
		yield str( x )

You really don't want to say:

write( ''.join(), numbers )

Nor do you want to say:

map( write, numbers() )


Wouldn't it be nice to peal off the first 1000 items, tell the OS to
write them, peal off the next 1000 items, etc etc ... and not even have
to absorb the memcpy cost of putting them all together?  Think of it as
the later, without quite as much harassment of the underlying operating
system.

Lastly, my intent is to expose the writev system call simply because:

* It is there.
* It is sometimes useful.
* If we get used to sharing our intent with the OS, the OS author might
get used to doing something useful with this knowledge.

Now you are probably thinking "but isn't this premature optimization." 
Yeah, it is, which is why we are up to version 2.4 without it.  But I
don't think it is premature anymore.

There is one more time that writev would be beneficial ... perhaps you
want to write a never ending sequence with a minimum of overhead? 

def camera():
	while 1:
		yield extract_entropy( grab_frame() )

open( "/tmp/entropy_daemon_pipe", "w+" ).writev( camera(), 5 ) 
# 1/5 of the OS overhead, still get a fresh update every 5/30th of a
second, assuming 30 FPS








Adam DePrince 





More information about the Python-list mailing list