(Maybe off topic) Can someone explain what a finite state machine is?

rusi rustompmody at gmail.com
Wed Jul 20 13:27:37 EDT 2011


On Jul 19, 6:32 pm, Matty Sarro <msa... at gmail.com> wrote:
> Hey everyone. I am currently reading through an RFC, and it mentions
> that a client and server half of a transaction are embodied by finite
> state machines. I am reading through the wikipedia article for finite
> state machines, and sadly it's going a bit above my head. I don't
> really have a background in engineering, and would really love to
> understand what is being said. Does anyone have a simple way to
> explain it?

Yes... "background in engineering" is what a layperson would assume
from the words "finite state machine" but this is not really so
(unless you think math is the same as engineering) because the words
were invented by mathematicians (no normal engineer would call a
machine an 'automaton' -- the original name!)

So assuming and staying within python a natural example would be what
is called lexical analysis or tokenizing.

Say you have a line of python like:

def foo(count,   x  ="if nothing is given") # catch a comment anyone?

the python interpreter in its lexical analysis phase breaks this up as
|def|foo|(|count|,|x||=|"if nothing is given"|)|<Comment DISCARDED>

Some of the things it does are:
1. Separating between all these tokens or lexemes
2. Distinguishing identifiers like foo, count, x (there's no
difference at this stage) from keywords like def
3. Discarding spaces but after using them to separate 'def' from 'foo'
4. Identifying special symbols like ( , = etc
5. Throwing away the comment

Now how does python know that the catch in the comment (or the if in
the string) are not the python keywords?

Well this is where the notion of state comes in:
It starts in a default or normal state but when it sees the " it goes
to (or is it gotos?) a state which we may call InsideStringState. In
this state everything is swallowed up indiscriminately until the next
"  Likewise an InCommentState would swallow up everything from # to
newline.

So to tokenize python it would be natural to think with 3 states:
Normal, InString and InComment

Of course in real programming languages there would be many more
states than 3.
eg What happens when the machine is in InString state and you
encounter a backslash?

Or more hairy example: Say you have a language like C whose comments
are /*... */
Now when you see a / is it a divide symbol or a comment?
All such problems get solved by introducing suitable states like
MaybeComment


In short, dont focus your attention on 'machine' but on 'state' and
you should be fine



More information about the Python-list mailing list