2. The Art of the State

2.1 Introduction to State Transition Diagrams

An important tool in program design and implementation is the state transition diagram.  We use state transitions to help us manage complicated logic and to build involved conditional statements.  Recognizing whether a string of characters could be interpreted as an integer is an example of a problem that can be represented by a state transition diagram.  A state transition diagram is made of nodes called states and labeled arcs (arrows) called transitions.  We move from one state to another as directed by values read from some input sequence.  We choose the arc whose label matches the current input value.

Example Problem: We wish to create a boolean function that returns true if its string argument is a valid integer.  We get to define what we mean by valid integer.

A string is a valid integer if,

1.  it is comprised of only the digits 0..9 with zero or more leading blanks
2.  it has at most one leading + or - sign followed by at least one digit 0..9
3.  it has no internal symbols other  than 0..9
We will look at a few examples.
 
 
12345 +12345 12345- + 1 2 3
-12345 123+45 123.45    123 12345.

The green strings are valid integers (by our arbitrary rules) while the red strings violate one or more of our rules. We will scan the string input one character (or symbol) at a time as we test the string.  Using the state transition diagram shown below, we begin in state S0. Starting with the leftmost symbol we look for a blank (b), a + or - sign (s),  or a digit (d).  We stay in state S0 if we encounter a blank, we transition to state S1 if we encounter a + or a -, and we transition to state S2 if we detect a digit 0 through 9.  If any other symbol is detected then we reject the string immediately (i.e. go to state S3).


b - blank symbol

s - a plus or a minus sign

d - a digit 0,1,2,...,9

else - any other symbol

all - every symbol

S0 - the start state

S2 - the accept state

Definitions for Integer Recognizer

State Transition Diagram - Integer Recognizer

The transition labeled else represents any symbol not explicitly referenced by another transition.  The transtion labeled all makes S3 a trap state.  The state S2 (with bold border) is the accept state.  Check the final state, when you have finished scanning the string.  If you are in S2 then you accept the string as a valid integer otherwise you reject it as a non-integer string.

Let's test the state transition diagram on some of the sample strings.  Our diagram should accept green strings since they are all valid integers.  After scanning a red string our state transition diagram shold be in a non-accept state.

Let's test the string "12345".  We start in state S0 so reading the 1, (a digit) causes a transition to state S2.  For each of the remaining symbols 2, 3, 4 and 5 (digits) the transitions keep us in S2.  Since we are in the accept state after all symbols have been scanned we accept the string "12345" as an integer.

Now let's try the strng "123+45".  Again we start in state S0.  Reading the 1, causes a transition to state S2.  For the symbols 2 and 3 we stay in S2. The next symbol is  + (a sign) which has no labeled transition from state S2 so we transition to state S3 (else).  For the 4 and 5 we stay in S3. S3 is not an accept state so we reject the string "123+45" as a non-integer.



Homework 2_1:  Assume that we wish to accept integer strings with trailing blanks.  Modify the state transition diagram above to accept  integer strings that have one or more trailing blanks such as "12345    ". Your diagram will have more than one accept state.  All other rules for accepting and rejecting strings should still apply.  Hint:  Make sure that your state transition diagram rejects any string that has internal blanks such as "123 45".

To create a boolean function for recognizing integer strings we need  to encode the operations of the state transition diagram above into source code.  Now we convert the state transition diagram into an equivalent state transition table. This table will show the new state given our current state and input character. The table will have a row for each state and a column for each kind of symbol.
 


State Transition Table

The columns labels are for the various symbol types of interest,  blanks = b , + and - signs = s , digits 0..9 =  d and any other symbol = o. We will also need to maintain our current state, set the current state to the start state and know which states are accept states.

We will now define a data structure to hold the contents of the state transition table.

type table_type is array(0..3,0..3) of integer;

table : table_type :=((0,1,2,3),
                      (3,3,2,3),
                      (3,3,2,3),
                      (3,3,3,3));

We will also need a way to convert each symbol read from the input string into a column index in the state transition table.  We can use a case statement to do this.
case symbol is
  when ' '      => sym_num:=0;
  when '+'|'-'  => sym_num:=1;
  when '0'..'9' => sym_num:=2;
  when others   => sym_num:=3;
end case;
If we use a case statement such as this to find the column index that corresponds to each  input character (symbol) we can obtain the new state directly from our table.
new_state:=table(cur_state,sym_num);
Where the first index (cur_state) is the row number of the table and the second index (sym_num) is the column number. (Don't forget that our table indices start with 0 rather than 1.  For example if we were in state 1 (second row of table) and we were currently reading a digit '5', the value of sym_num would be set to 2 (third column of table) and the new_state would be set to 2.  In practice we don't have to declare two different identifiers to keep up with the new state and the current state. We can rewrite the assignment above as,
state:=table(state,sym_num);
and our state will be updated each time we execute this assignment. review the test program below to see how the state transition table can be used in the boolean function is_integer( ).
with ada.text_io, ada.integer_text_io;
use  ada.text_io, ada.integer_text_io;
procedure state is
   subtype wordtype is string(1..20);
   str : wordtype;
   num : integer;

   function is_integer(str: wordtype; n : integer) return boolean is
      type table_type is array(0..3,0..3) of integer;

      table : table_type :=((0,1,2,3),
                            (3,3,2,3),
                            (3,3,2,3),
                            (3,3,3,3));
      state   : integer:=0;
      sym_num : integer;
      sym     : character;

   begin
      for i in 1..n loop
         sym:=str(i);
       case sym is
          when ' '      => sym_num:=0;
          when '+'|'-'  => sym_num:=1;
          when '0'..'9' => sym_num:=2;
          whenothers   => sym_num:=3;
       end case;
         state:=table(state,sym_num);
      end loop;
      return (state=2);
   end is_integer;

begin

   loop
      put("Enter string for integer test (or press enter to quit)... ");
      get_line(str,num);
      exit when num=0;

      new_line;
      if is_integer(str,num) then
         put(" this is an integer");
      else
         put(" this is not an integer");
      end if;
      new_line; new_line;
   end loop;

end state;

The main program asks the user to enter a string to be tested.  It calls the function is_integer and displays the appropriate message indicating whether the input string is an integer according the our definition.  Notice that some strings that could be considered integers are rejected by our function.  If we wish to permit trailing blanks or a blank between a sign and the first digit, we will have to alter the state transition table.

2.2 Finite State Automata

Formal language theory in computer science defines a finite state automaton (FSA) as an abstraction of a machine that can solve a particular class of problems.  Language recognition is an example of the class of problems solvable with FSAs.  In this context, a language is just a set of strings of symbols from an alphabet of valid symbols.

If we let our alphabet be {0,1} then we could define a language as (1) the set of strings of 0s and 1s that contained no more that two consecutive 1s.  Another language could be (2) the set of strings that contained an even number of 0s.  Examples of FSAs to recognize each of these languages are shown below:


Recognizer for binary strings with no more than two consecutive 1's

The start state is labeled and is usually defined as state 0. We read the string being evaluated one symbol at a time and move from state to state as indicated by the transitions (arrows).  For example, starting in state-0, if we read a 0 symbol we stay in state 0.  If we read a 1 we transition to state 1.  The value of the symbol currently being read tells us which transition to use.  If the final state is an accept state (double circle) then we accept or recognize the string as a member of the language.
 
 


Recognizer for binary strings with an even number of 0's

The language of binary strings with an even number of 0's can be recognized with the two state FSA shown above.  Even though our candidate strings can be arbitrarily long we only need a finite number of states to determine whether the string contains an even or odd number of 0's.

Some languages cannot be recognized using FSAs.  For example consider the language of binary strings with at least as many 0's as 1's.  When we attempt to build an FSA that recognizes this language we discover that we need to keep a count of the number of 0's, N0 and the number of 1's, N1 to ensure that N0>=N1.  We could build an FSA that maintained a running relative count such as,

As we read the symbols of a string we would move to the left for 0's and to the right for 1's.  We recognize the string as an element of the language if we are in an accept state after scanning the entire string.  Since the sample strings can be arbitrarily long we would have to have an infinite number of states to ensure an accurate count of 1's and 0's.  But a finite state machine must have a finite number of states.

Homework 2.2:  Assume that the set of valid symbols is {0,1} as you build FSAs to recognize each of the languages described below:

a. The set of strings with at least three 1's (not necessarily consecutive)

b. The set of strings with an odd number of 0's and an even number of 1's.

c. The set of strings that contain the sequence 1101.

d.  The set of strings which, when interpreted as a binary value, are divisible by 4.

You might want to try a few more of these problems as practice.

This course material has been converted to electronic format for use at
KCVU, Murray State University, and
Dept. of Computer Science and Information Systems

Use of this material for educational purposes only is governed by
the definition of Fair Use (Section 107) of the U.S. Copyright Act.
Course material is the property of  R. A. Pilgrim
All Rights Reserved