[Tutor] reading an input stream

James Chapman james at uplinkzero.com
Thu Jan 7 10:48:12 EST 2016


Hi Richard

There are a number of considerations you need to take into account here.

Raw sockets is almost never the right solution, while a basic socket to
socket connection is easy enough to program, handling failure and
concurrency can very quickly make the solution a lot more complex than it
needs to be, so perhaps you could supply more information? (I realise I'm
venturing outside the realm of learning python, but I'm a pedant for doing
things right).

You said you need to read XML in from a socket connection. You've not
mentioned what's generating the data? Is that data sent over HTTP in which
case is this part of a SOAP or REST API? Is the data being generated by
something you've written or a 3rd party software package? Is REST an
option? Is there a reason to serialise to XML? (If I was performing the
serialisation I would go with JSON if being human readable was a
requirement. )

If the method of receiving that data is optional, have you considered using
something like AMQP (RabbitMQ) which would eliminate your need to support
concurrency? It would also handle failure well.

James


--
James

On 29 December 2015 at 20:14, richard kappler <richkappler at gmail.com> wrote:

> Sorry it took so long to respond, just getting back from the holidays. You
> all have given me much to think about. I've read all the messages through
> once, now I need to go trough them again and try to apply the ideas. I'll
> be posting other questions as I run into problems. BTW, Danny, best
> explanation of generators I've heard, well done and thank you.
>
> regards, Richard
>
> On Thu, Dec 24, 2015 at 4:54 PM, Danny Yoo <dyoo at hashcollision.org> wrote:
>
> > > I think what I need to do would be analogous to (pardon if I'm using
> the
> > > wrong terminology, at this poing in the discussion I am officially out
> of
> > > my depth) sending the input stream to a buffer(s) until  the ETX for
> that
> > > message comes in, shoot the buffer contents to the parser while
> accepting
> > > the next STX + message fragment into the buffer, or something
> analogous.
> >
> > Yes, I agree.  It sounds like you have one process read the socket and
> > collect chunks of bytes delimited by the STX markers.  It can then
> > send those chunks to the XML parser.
> >
> >
> > We can imagine one process that reads the socket and spits out a list
> > of byte chunks:
> >
> >     chunks = readDelimitedChunks(socket)
> >
> > and another process that parses those chunks and does something with
> them:
> >
> >     for chunk in chunks:
> >         ....
> >
> >
> > It would be nice if we could organize the program like this.  But one
> > problem is that chunks might not be finite!  The socket might keep on
> > returning bytes.  If it keeps returning bytes, we can't possibly
> > return a finite list of the chunked bytes.
> >
> >
> > What we really want is something like:
> >
> >     chunkStream = readDelimitedChunks(socket)
> >     for chunk in chunkStream:
> >         ....
> >
> > where chunkStream is itself like a socket: it should be something that
> > we can repeatedly read from as if it were potentially infinite.
> >
> >
> > We can actually do this, and it isn't too bad.  There's a mechanism in
> > Python called a generator that allows us to write function-like things
> > that consume streams of input and produce streams of output.  Here's a
> > brief introduction to them.
> >
> > For example, here's a generator that knows how to produce an infinite
> > stream of numbers:
> >
> > ##############
> > def nums():
> >     n = 0
> >     while True:
> >         yield n
> >         n += 1
> > ##############
> >
> > What distinguishes a generator from a regular function?  The use of
> > "yield".  A "yield" is like a return, but rather than completely
> > escape out of the function with the return value, this generator will
> > remember what it was doing  at that time.  Why?  Because it can
> > *resume* itself when we try to get another value out of the generator.
> >
> > Let's try it out:
> >
> > #####################
> >
> > >>> numStream = nums()
> > >>> numStream.next()
> > 0
> > >>> numStream.next()
> > 1
> > >>> numStream.next()
> > 2
> > >>> numStream.next()
> > 3
> > >>> numStream.next()
> > 4
> > #####################
> >
> > Every next() we call on a generator will restart it from where it left
> > off, until it reaches its next "yield".  That's how we get this
> > generator to return an infinite sequence of things.
> >
> >
> > That's how we produce infinite sequences.  And we can write another
> > generator that knows how to take a stream of numbers, and square each
> > one.
> >
> > ########################
> > def squaring(stream):
> >     for n in stream:
> >         yield n
> > ########################
> >
> >
> > Let's try it.
> >
> >
> > ########################
> >
> > >>> numStream = nums()
> > >>> squaredNums = squaring(numStream)
> > >>> squaredNums.next()
> > 0
> > >>> squaredNums.next()
> > 1
> > >>> squaredNums.next()
> > 4
> > >>> squaredNums.next()
> > 9
> > >>> squaredNums.next()
> > 16
> > ########################
> >
> >
> > If you have experience with other programming languages, you may have
> > heard of the term "co-routine".  What we're doing with this should be
> > reminiscent of coroutine-style programming.  We have one generator
> > feeding input into the other, with program control bouncing back and
> > forth between the generators as necessary.
> >
> >
> > So that's a basic idea of generators.  It lets us write processes that
> > can deal with and produce streams of data.  In the context of sockets,
> > this is particularly helpful, because sockets can be considered a
> > stream of bytes.
> >
> >
> > Here's another toy example that's closer to the problem you're trying
> > to solve.  Let's say that we're working on a program to alphabetize
> > the words of a sentence.  Very useless, of course.  :P  We might pass
> > it in the input:
> >
> >     this
> >     is
> >     a
> >     test
> >     of
> >     the
> >     emergency
> >     broadcast
> >     system
> >
> > and expect to get back the following sentence:
> >
> >      hist
> >      is
> >      a
> >      estt
> >      fo
> >      eht
> >      ceeegmnry
> >      aabcdorst
> >      emssty
> >
> > We can imagine one process doing chunking, going from a sequence of
> > characters to a sequence of words:
> >
> > ###########################################
> > def extract_words(seq):
> >     """Yield the words in a sequence of characters."""
> >     buffer = []
> >     for ch in seq:
> >         if ch.isalpha():
> >             buffer.append(ch)
> >         elif buffer:
> >             yield ''.join(buffer)
> >             del buffer[:]
> >     # If we hit the end of the buffer, we still might
> >     # need to yield one more result.
> >     if buffer:
> >         yield ''.join(buffer)
> > ###########################################
> >
> >
> > and a function that transforms words to their munged counterpart:
> >
> > #########################
> > def transform(word):
> >     """"Munges a word into its alphabetized form."""
> >     chars = list(word)
> >     chars.sort()
> >     return ''.join(chars)
> > #########################
> >
> > This forms the major components of a program that can do the munging
> > on a file... or a socket!
> >
> >
> > Here's the complete example:
> >
> >
> > #############################################
> > import sys
> >
> > def extract_words(seq):
> >     """Yield the words in a sequence of characters."""
> >     buffer = []
> >     for ch in seq:
> >         if ch.isalpha():
> >             buffer.append(ch)
> >         elif buffer:
> >             yield ''.join(buffer)
> >             del buffer[:]
> >     # If we hit the end of the buffer, we still might
> >     # need to yield one more result.
> >     if buffer:
> >         yield ''.join(buffer)
> >
> > def transform(word):
> >     """"Munges a word into its alphabetized form."""
> >    chars = list(word)
> >     chars.sort()
> >     return ''.join(chars)
> >
> >
> > def as_byte_seq(f):
> >     """Return the bytes of the file-like object f as a
> >     sequence."""
> >     while True:
> >         ch = f.read(1)
> >         if not ch: break
> >         yield ch
> >
> >
> > if __name__ == '__main__':
> >     for word in extract_words(as_byte_seq(sys.stdin)):
> >         print(transform(word))
> > ############################################
> >
> >
> >
> > If you have questions, please feel free to ask.  Good luck!
> >
>
>
>
> --
>
> All internal models of the world are approximate. ~ Sebastian Thrun
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>


More information about the Tutor mailing list