This chapter discusses the internal representation of data. Collections of bits are arranged in standard formats to represent characters and numbers. Access to memory in machine determined units of fixed length called words.
CHAPTER OUTLINE
2.0 Introduction
2.1 Bits and Character Codes
2.1.1 Bytes and Characters
2.1.2 BCD
2.1.3 EBCDIC
2.1.4 ASCII
2.1.5 Other Codes
2.2 Number Systems
2.2.1 Positional Notation and Base 10 - Decimal
2.2.2 Base 2 - Binary
2.2.2.1 Binary Integer Numbers
2.2.2.2 Numeric Data - Packed and Unpacked Decimal
2.2.2.3 Binary Real Numbers
2.2.3 Base 2 Derivatives - Octal and Hexadecimal
2.2.4 Integer Conversions
2.2.4.1 Conversion from Binary to Decimal
2.2.4.2 Conversion from Decimal to Binary
2.2.4.3 Conversion from Binary to Hexadecimal
2.2.4.4 Conversion from Hexadecimal to Binary
2.2.5 Binary Addition
2.2.6 Negative Binary Integers
2.2.6.1 Sign - Magnitude Convention
2.2.6.2 Bias or Excess Convention
2.2.6.3 Complements
2.2.7 Binary Subtraction
2.2.8 Binary multiplication
2.2.9 Floating Point Real Numbers
2.2.9.1 IBM Mainframe Floating Point Convention
2.2.9.2 IEEE Floating Point Standard
2.2.10 Range and Accuracy
2.2.10.1 The Range and Accuracy of Integers
2.2.10.2 The Range and Accuracy of Reals
2.3 Words
2.4 Parity
Objectives
When you complete this chapter you will be able to:
4. Describe the internal binary representation of real numbers by several different conventions.
5. Discuss bus organization.
6. Discuss byte and word concepts.
7. Discuss numeric accurancy in representation and cumputation.
As we move up the evolutionary chain, we find the more advanced the
species, the more complex their means of communicating and storing data.
A beaver signals danger by slapping its tail on the water. Many scientists
think porpoises have complex language structures. Our species has developed
the most complex systems of storing data. The hieroglyphics of the ancient
Egyptians were a mystery until the key was discovered in the Rosetta stone.
The Chinese and Japanese have spoken languages that are as different as
French and Spanish, but use essentially the same set of symbols in their
written language. The Semitic peoples of the Middle East developed a phonetic
alphabet that provided much more versatility than the picture based written
languages. There are also different ways that have been devised to represent
numeric information such as Roman numerals or the familiar decimal number
system.We are used to the Indo-European phonetic alphabet. It is interesting
that all languages represent essentially the same ideas and data.
Because of the electronic digital computer, we are now at the onset
of what has been called the information age. This necessitates some way
of electronically representing the wide range of different ways of storing
data. Unique strings of binary 1's and 0's can represent virtually all
possible symbols. The question is often asked "Why use binary digits
instead of the more natural decimal digits, 0, 1, 2, 3, 4, 5, 6, 7, 8,
9?" Although some early attempts were made to build decimal based computers,
such as the IBM 650, it was found, that because of the simplicity of representation
the binary system was vastly superior. It is relatively easy to distinguish
between a positive or a negative voltage, or between a magnetic pole lined
up north-south or south-north, or the presence of a hole or no hole in
a punched card. It is also easy to change the state very fast if there
are only two possibilities. It is much more difficult, if not technically
impossible, to achieve the same speed and accuracy with other number systems.
However, there may be something much more fundamental and potentially exciting
about using a binary system to represent information. In our quest for
extraterrestrial life we are transmitting binary patterns representing
numbers. The Voyager space probe also has binary messages to potential
intelligent life forms that receive its messages. Perhaps if we ever make
contact with other life forms, our first means of communication will be
through some sort of binary codes.
In today's complex world of computers and data communications it is
helpful to have an understanding of the fundamental building blocks of
the information explosion. In Chapter 1 we briefly discussed how strings
of bits could represent addresses or locations in memory as well as instructions
to be executed by the computer. The picture, however, is not yet complete.
A similar coding scheme is used to represent both numeric and character
data. In this chapter, we will inspect several of the standard conventions
that have been developed to represent character data. We will also
discuss the representation of numeric data and how it can be manipulated.
The final topic will deal with the issue of insuring data integrity.
The early character code used six bits or binary digits for each
symbol. This was sufficient for the digits (0, 1, 2, 3, 4, 5 ,6 ,7 ,8 9,)
capital letters (A, B, C,..., X, Y, Z) and a few special symbols. As computing
and data processing became more complex it was necessary to increase the
number of bits to seven or eight to represent both upper and lower case
letters and more special symbols. There are similar concerns with numeric
data. The more bits that are used, the more accurate and wider range of
numbers that can be represented. This is one of the important differences
between today's personal computers and the very large supercomputers. The
larger systems can more easily represent and manipulate much larger numbers
much more accurately. In this chapter we will first look at the character
codes that have been used to represent data. Next we will investigate how
numeric data is stored and manipulated in today's modern digital computers.
The chapter will conclude with a brief discussion on how data integrity
can be improved in the binary system by including some extra information.
2.1 Bits and Character codes
2.1.1 Bytes and Characters
Various schemes have been developed to represent data in the computer
and to conveniently transmit it between systems. There is a need to represent
both numeric and alphabetic information. There is also a need to represent
special characters (e.g., ?/<>{}{} etc.) and control information (e.g.,
backspace (BS), delete (DEL), new line (NL), etc.). The three codes we
examine below have the capability to represent all types of character data.
We commonly refer to this data as alphanumeric data. In most computing
systems, the hardware has been designed to conveniently store and manipulate
one character of data. This unit of storage is usually referred to as one
byte.
2.1.2 BCD Characters
The first coding scheme that became widely accepted was called BCD
or Binary Coded Decimal. It was the dominant representation used
on second generation computing systems. Six bits are used to represent
the different characters. As shown in Figure 2.1, the four right most bits
(low order bits) are called the numeric bits, while the left two bits (high
order bits) are called the zone bits. The zone bits are designated A and
B. In Section 2.1.3.1 we shall see how the numeric bits are given their
associated decimal value. The check bit, sometimes called parity, is explained
in Section 2.4. When the zone bits are 00, the numeric bits are interpreted
as the numeric digits 0 through 9. In Section 2.1.3.1, we will see that
the code is a direct use of the binary number system. When the zone bits
are any of the other three configurations, 01, 10, 11, the interpretation
is then a letter or special symbol.
Figure 2.2 shows the entire set of characters and their representation.
With the six bit restriction, only 64 different characters are possible.
As computing systems became larger and more powerful, there was a need
for more characters including the lower case letters and special symbols.
This led to the development of the next two codes, EBCDIC and ASCII.
2.1.3 EBCDIC Characters
Extended Binary Coded Decimal Interchange Code or EBCDIC
(pronounced Eb-see-dic) uses eight bits to represent each character, thus
expanding the possible character configurations to 256. The extra characters
are used for lower case letters, non-printing characters that are used
to send control information, and other special printing characters. There
are still unused codes for future growth. It also uses four numeric bits
but has added two zone bits. In an attempt to maintain compatibility with
the older BCD code, the meaning of the numeric bits was maintained for
the digits and alphabetic letters.
The EBCDIC character codes are shown in Figure 2.3. Several properties
are interesting from a data processing perspective. Notice that the only
difference between the upper and lower case letters is the second zone
bit, so a translation from upper to lower case or vice versa can be accomplished
by simply flipping one bit. Notice also that the zone bits can be used
to test for special characters, letters or digits. This property is important
when testing for the validity of input data. It would be relatively easy
to make sure that a user entered a percent as 0.05 or 5% depending on what
the computer was expecting.
One additional property is also important. If the eight bits used to
represent the letters are interpreted as binary numbers, then the letters
follow a sequence with "A" having the lowest value and "Z" having the largest
value. This property is critical when sorting data alphabetically. The
computer can interpret the bit patterns as binary integers, perform the
sort, and then reinterpret the bit patterns as characters. The order used
in arranging characters is called the collating sequence and may
differ for different character codes.
2.1.4 ASCII
ASCII or the American Standard Code for Information Interchange
(pronounced Ass-kee) was developed through the cooperation of the computing
and communications industry in an attempt to simplify and standardize machine-to-machine
and system-to-system communication. The set consists of letters, digits,
special characters, control characters and graphic characters. It is shown
in Figure 2.4. ASCII has many of the properties of the EBCDIC code discussed
above but there are several important differences. ASCII uses only seven
bits. As we will see in Section 5, when we add one bit to help ensure data
integrity, all the ASCII characters conveniently fit into one byte. This
fact has significant benefits for communication. Another important difference
is the sequence of characters. Notice that there are no gaps in the numeric
values of the codes. This permits easy checking for data validity. ASCII
has become the defacto standard on micro and mini computing systems and
the standard for communications. EBCDIC is still found extensively on the
large IBM and IBM compatable computing systems. The majority of both large
and small systems provide translation utility programs to convert from
one system to the other.
2.1.5 Other Codes
The three codes we have discussed so far all have a fixed number of
bits. It is also possible to have variable length codes. The advantage
of a variable length code permits a significant saving in space. This is
obvious if we were to assign
A 0
B 1
C 00
D 01
E 10
F 11
G 000
H 001
...
Since there is a variable length, it is then necessary to have some
technique to indicate when one character stops and another starts. An additional
saving in space may be realized by assigning the shortest codes to the
most frequently occurring characters. One common method is called Hoffman
Coding. It should be obvious that there are trade offs in using variable
length codes in place of the fixed length methods. The saving in space
comes at the cost of increased processing time to decode the variable length
characters.
2.2 Number Systems
In Chapter 1 we mentioned how strings of bits could be interpreted as
addresses and instructions. In the previous section we saw how strings
of bits could be interpreted as characters. In this section we will continue
the discussion and show how numeric information may be stored. As we develop
the material in this section, it will become clear that the character representation
of digits is fundamentally different from a numeric value. A given string
of bits can take on different meaning depending on the context of its use.
It was Ada, Countess of Lovelace who first suggested using the binary
number system to represent data inside the computer. John von Neumann is
credited with reintroducing the idea almost a hundred years later in the
early computing machines. In order to understand the number schemes or
bases important in computing, we first will review some fundamental concepts
using the more familiar base 10 or decimal system.
2.2.1 Positional Notation and Base 10 - Decimal
Certainly the number system we are all most familiar with is decimal
or base 10. It is called base 10 because each position in a number has
a value ten times as much as the position to its right and there are ten
different digits 0,1,2,3,4,5,6,7,8,9. For example, in the number 354, the
4 represents 4 "ones", the 5 represents 5 "tens" and the 3 represents 3
"hundreds". We can see that each digit has two meanings; one is its own
numeric value, and the second its position. As we count, when we reach
9, we know the next number starts with 1 in the next column. We say we
have carried into the next position. If we are dealing with real numbers
with a decimal point the same principle applies, except now as you move
to the right of the decimal point you divide by ten for each successive
position.
To reinforce these ideas let's look at an example: The number 354.286
can be represented as
3x100 + 5x10 + 4 + 2/10 + 8/100 +6/1000
Using exponential notation this can also be represented as
3x102 + 5x101 + 4x100 + 2x10-1
+ 8x10-2 +6x10-3.
As we shall see, using this last form will help us understand other
number bases.
2.2.2 Base 2 - Binary
Modern computing systems, as we have discussed, do not use the decimal
but rather the binary number system. The only digits in this system are
0 and 1. These two digits are often called bits for Binary
digITS. Using the concepts discussed for base ten, we can also count
in binary: zero - 0, one - 1; but now when we reach two, since we only
have two digits, we carry over to the next position and have 10. This not
the number ten in decimal, but the number two in binary. Very often, to
avoid ambiguity, a subscript is used, 102. Figure 2.5 shows
how to count in several different number systems.
2.2.2.1 Binary Integer Numbers
To better understand these concepts, consider the following number in
base two:
1011012
Just as we did in base ten, as we move one position to the left each
digit has a value increased by one more multiplication of the base:
1x25 + 0x24 + 1x23 + 1x22
+ 0x21 1x20
This can also be written as
1x32 + 0x16 + 1x8 + 1x4 + 0x2 + 1
We thus see that
10110122 = 451010.
An important property of the binary system is its simplicity: because
we only have the digits 0 and 1, a position either contributes a power
of two or nothing since it is multiplied by 1 or zero. This property simplifies
the arithmetic electronic circuits and thus helps make them more reliable.
2.2.2.2 Numeric Data - Packed and Unpacked Decimal
In many applications the accuracy available using fixed length stings
of bits for numeric data is not sufficient. (The range and accuracy of
binary numbers is discussed in Section 2.2.10.1.) This often happens when
we are considering government or large corporate budgets. An alternate
scheme may be used to store numbers to an arbitrary degree of accuracy.
The formats used are called packed and unpacked decimal.
Frequently, when numeric data is read into the computer in unpacked decimal form. This a simple variant on EBCDIC. From Figure 2.3 we observe that the EBCDIC representation of the digits 0 through 9 consists of the bits 1111 followed by the binary value for that digit. For example, "1" is 11110001 and "9" is 11111001. The left four bits are called the zone portion and the right four bits are the numeric portion. This is shown in Figure 2.6. All the digits in the number are stored in this manner except the right most digit. For the right most digit, the zone portion is used to represent the algebraic sign of the number. A positive sign is 1100 (C16) and a negative sign is 1101 (D16). Thus, the number
+134 becomes 11110001 11110101 11000100
1 3 + 4
and
-97 becomes 11111001 11010111.
9 - 7
From the above discussion, it should be obvious that the information
in the zone portion is unnecessary except for the right most digit.
This leads to the packed decimal format where all the zone bits are removed except for the right most digit. In this case the zone or sign is placed in the right most position. If there are not a complete number bytes, then a leading zero is added (see Figure 2.8). To clarify this discussion, consider the packed decimal form of the numbers used in the previous example.
+134 becomes 00010101 01001100
1 3 4 +
and
-97 becomes 00001001 01111101
0 9 7 -
Many computer systems will have hardware designed to operate on numbers
in packed decimal directly. The loss in speed is balanced against increased
accuracy.
2.2.2.3 Binary Real Numbers
A real number consists of both an integer and fractional part. To represent
real numbers in binary, we introduce a concept similar to the decimal point,
but now we call it a binary point. The number
101.0122
is interpreted as
1x22 + 0x21 + 1x20 + 0x2-1
+ 1x2-2
or
4 + 0 + 1 + 0 + 1/4 = 5.2510
Computing systems may use different conventions to store negative numbers
or real numbers but the concepts are all similar.
There are some problems with this simple approach, you may have noticed
that there is no way to represent all possible decimal (fractional) numbers
using this system. How would you represent the value 0.66; there is no
combination of negative powers of two that could be used to represent this
value. The best we can do is to form an approximation of this number. We
can continue to add ever smaller powers of two until we get close enough.
We will have to decide upon some tolerance factor and when we get within
that value we will stop.
In general terms, we might represent the value as
dn2n + dn-12n-1 + ... d020
+ c12-1 + c22-2 + ... + cm-12m-1
<= X
where X is the decimal number to be represented.
We will have to choose some tolerance to satisfy this inequality. You
have probably noticed when using real numbers with a high level language
that you never seem to get a zero (0) value when dealing with these numbers.
The tolerance is causing the problem. Simply increasing the precision (the
number of digits used) will not solve the problem. More precision will
simply move the problem to the end of a long line of 0s with a few 1 bits
near the end. Most systems pick a fixed number of bits to represent the
number. This is set in the hardware and cannot be changed. Actually, several
options are usually available given different degrees of accuracy called
standard precision, double precision and sometimes extended precision.
We will consider real numbers and negative numbers in more detail in
Sections 2.2.6 and 2.2.9.
2.2.3 Base 2 derivatives - Octal and Hexadecimal
Two other number bases are important because of there relationship to
base 2. They are base eight or octal and base sixteen or hexadecimal
(hex). Counting in these bases is shown in Figure 2.5. In hexadecimal
we have 16 digits so we use the ten regular digits and the first 6 letters
of the alphabet.
The importance of octal and hex is based on the property that they can
serve as convenient abbreviations for binary numbers. From Figure 2.5,
we can see that each octal digit can represent three binary digits in a
unique manner and each hexadecimal digit can represent four binary digits.
Early computers that used the 6 bit BCD code could abbreviate each character
using two octal digits. In these machines 6 bits were called a byte.
More modern machines are almost exclusively 8 bit oriented to conveniently
handle EBCDIC and ASCII. Each 8 bits can be abbreviated using two hexadecimal
digits. The term byte is now commonly accepted to be 8 bits. Since
one byte can be used to store one character, the amount of memory is sometimes
given as the number of characters. In this case character and byte are
synonymous. As shown in Figure 2.5, instead of using 8 bits we can use
just two hexadecimal digits.
2.2.4 Integer Conversions
With the four different number systems we have introduced there are
a large number of conversions we could consider; however we will restrict
our attention only to integer numbers and to the most important systems:
binary to decimal
decimal to binary
binary to hexadecimal
hexadecimal to binary
If it is necessary to convert between decimal and hexadecimal, it is
always possible to pass through binary such as
base 10 <--> base 2 <--> base 16
Sometimes when there are system problems, it is necessary to inspect
the contents of memory. A utility program is usually provided with the
operating system which will perform the task of "dumping" the contents
of memory onto a printer. This is called a core dump or memory
dump. The output is usually in hexadecimal to conserve space on the
printed page. A typical example is shown in Figure 2.7. To interpret this
output, it is often necessary to convert hexadecimal to binary or decimal,
or perform the reverse operation. These conversions are not difficult,
but as with most mathematical operations, it takes practice to do them
rapidly. Most professionals in the information and data processing fields
rarely have to make use of these techniques. Consequently, for those few
instances when it is necessary, a general acquaintance with the technique
is adequate.
2.2.4.1 Conversion from Binary to Decimal
To convert from binary to decimal recall that each position represents
a certain power of two. Consider the following base 2 number:
10011011
this can be represented as
1x27+0x26+0x25+1x24+1x23+0x22+1x21+1x20
Using the powers of two from Figure 2.8 we have
128+16+8+2+1=155
2.2.4.2 Conversion from Decimal to Binary
To convert from decimal to binary we use successive subtraction of powers
of two. First find the largest power of two that can be subtracted from
the decimal number. That becomes the first binary digit on the left or
high order bit. As an example, we will use the same number as before, 155.
From Figure 2.8, we see that 128 is the largest power of two that can be
subtracted from 155 leaving a remainder of 27; hence we have
155 - 128 = 27
or
1x27 + 27
We now repeat the procedure with 27. The largest power of two that can
be subtracted from 27 is 16 (24) leaving a remainder of 11.
1x27 + 1x24 + 11
Again, repeating for 11
1x27 + 1x24 + 1x23 + 3
Finally for the last two contributions
1x27 + 1x24 + 1x23 + 1x21
+ 1x20
After we have filled in the powers that do not contribute we have
1x27+0x26+0x25+1x24+1x23+0x22+1x21+1x20
Leaving of the powers of two we have
10011011
In tabular form this can be summarized as
number power contribution
of
two
155
-128 27 1
27
26 0
25 0
-16 24 1
11
- 8 23 1
3
22 0
- 2 21 1
1 20 1
Now the binary number can be constructed by reading the third column
from bottom to top which gives 10011011.
2.2.4.3 Conversion from Binary to Hexadecimal
Conversion from binary to hexadecimal is quite simple. First check to
see if the number of bits is divisible by four. If it is not add the necessary
number of 0's in front. Leading zeros do not change the value. For example
0101011111
becomes
00101011111
Now divide the binary number into groups of four bits
0010 1010 1111
We can now use Figure 2.8 to complete the process by substituting for
each group of four binary digits one hexadecimal digit. The result is
2 A F
2.2.4.4 Conversion from Hexadecimal to Binary
Conversion from hexadecimal to binary is just as simple. Consider the
hexadecimal number
E19C
Using Figure 2.8 this becomes
E 1 9 C = 1110 0001 1001 1100
or
1100000110011100
2.2.5 Binary Addition
Most arithmetic logic units support all of the standard integer arithmetic
operations directly in hardware. The more powerful machines will also include
floating point or real number operations in hardware. Floating point numbers
will be discussed in the next section.
Standard mathematics functions include addition, subtraction multiplication
and division. Rather than a complete treatment of these operations, we
will concentrate on the two most important operations, those of addition
and subtraction with a brief discussion of multiplication. When analyzing
the details of a memory dump, addition and subtraction are the only necessary
operations.
If you were to closely analyze the way you perform addition in the decimal
number system, each time you add a column of two numbers you work with
three inputs; the first digit, the second digit and the carry from the
previous column. For example
76
+59
135
When you are adding the 5 and 7, there is a 1 to carry from the previous
column. To find the result, all of us have memorized a 10 by 10 table called
the addition table because in decimal each digit can have one of ten different
values (see Figure 2.9). The result actually consists of two values; a
3 from the addition, and a 1 to carry into the next column.
The analogous operations are done in binary, except each input can have
only two different values, 0 or 1 (see Figure 2.10). If we have three inputs,
there are 8 possible combinations. We also have two outputs; the value
of the addition for that particular column, and the carry value to the
next column. Figure 2.11 is a tabular representation of the different possibilities.
This probably sounds more confusing than it really is. We will consider
a few examples to clarify the operations. First consider the problem discussed
above:
76
+59
135
In binary this would be
1001100
+0111011
The inputs from the first column are
bit 1 = 0
bit 2 = 1
carry-in bit = 0 (no carry possible in first column)
Therefore, from Figure 2.9, the outputs are
result bit = 1
carry-out bit = 0
Each column is done in the same way with carry-out bit from the previous column becoming the carry-in bit. The final result is
1001100
+0111011
10000111
which is 135 in decimal.
2.2.6 Negative Binary Integers
In the Section 2.2.2 we discussed how positive integers were stored
in binary, but no mention was made of negative numbers. Several different
techniques are possible. Each is used in different situations in today's
computing systems.
2.2.6.1 Sign-Magnitude Convention
Perhaps the most obvious technique for storing negative numbers is to
reserve the first bit as a sign bit. A "0" would be positive (+) and a
"1" would be negative (-). If we were to consider 8 bit numbers then the
number +21 would be
00010101
while the number -21 would be
10010101
Since the lead bit is used to represent the algebraic sign, the largest
number in absolute value has seven bits or
1111111
In decimal this is 127. Note that there are two values for zero
10000000 and 00000000
The most common use for the sign magnitude convention is to store the
mantissa for floating point numbers which we discuss in Section 2.2.9
2.2.6.2 Bias or Excess Convention
Bias or excess notation uses a simple transformation to
convert a range of positive and negative numbers to all positive values.
The hardware logic then does not have to consider negative numbers. In
this convention a fixed amount is added to all numbers in the range shifting
them on the real axis so that they are stored as positive integers. This
convention requires the range of integers considered to be symmetric about
zero. For example, we could consider the numbers from -128 to +127 where
zero is considered a positive number. These could be written in binary
as
-128 = -10000000 and +127 = +01111111
If we add 128 or 10000000 to all the numbers the range becomes
0 = 00000000 to +255 = +11111111
The bias is always a power of two and is consistent with the size of
integers in a particular hardware configuration. The bias convention is
commonly used to represent the characteristic or exponent in floating point
reals which is discussed in Section 2.2.9
2.2.6.3 Complements
The final convention we will consider, two's complement, is probably
the most commonly use to represent positive and negative integers. Rather
than dwell on the mathematical foundations; we will simply look at how
to form a two's complement number and some of the important properties.
This technique works because we are dealing with a fixed number of bits
to represent the number. If we have a positive integer in binary, to represent
its negative counterpart in two's complement, we change all the bits (flip
the bits) and add 1. For example, the decimal number 117 can be represented
in an 8 bit pattern as
01101010
To find its two's complement we first flip all the bits
10010101
and add one
10010101 + 1 = 10010110
This is now interpreted as -117.
The way one determines whether you have a negative number in two's complement
is based on the value of the first bit. If it is 0 then the value of the
number positive and given by the binary value. If the first bit is 1, the
number is negative. To find its absolute value, you take the two's complement
and use its binary value.
As another example, consider the number
11001011
Since the first bit is 1, we have a negative number in two's complement.
To find its algebraic value we take the two's complement flip bits
00110100
add one
00110101
Then using the techniques we have learned we find the value is
-53
A careful consideration of the previous examples leads to the conclusion
that the largest positive number that can be stored in eight bits is
01111111
since, if the first bit was 1, we would interpret it as a negative number
in two's complement. This number is
27-1 or 127
and the largest negative number is
-27 = 10000000 or -128.
Thus, using two's complement, the range of integers that can fit in
one byte is -128 to +127 or exactly 256 different values. This is consistent
with the statement made in Section 2.1.3 in the discussion of EBCDIC, where
we stated that there were a possible 256 different characters that could
be represented in an eight bit code. It is also true that the complement
of the complement is the original number.
2.2.7 Binary Subtraction
Subtraction is similar to addition. We make use of the property that
subtraction of a number is the same as the addition of its two's complement.
This is one of the most important properties of two's complements. An example
should be sufficient to explain the principle. We will assume we have eight
bit numbers. Recall that the range is -128 to + 127 and that a number is
negative if the first bit is 1. Using the same numbers as the addition
example we have
7 76
-59
+17
In binary this is
01001100
-00111011
where we have added a lead 0 to complete the 8 bits.
Forming the two's complement of
00111011
gives
11000101
Using this in the problem leads to a new problem statement
01001100
+11000101
Now using Figure 2.8 and ignoring the last carry gives
01001100
+11000101
00010001
Since the lead bit is zero we interpret the answer as a positive number
whose value is +17.
As a final example we will consider what happens when the answer is
negative. Reversing the numbers from the previous example we have
59
-76
-17
In binary this is
00111011
-01001100
Forming the two's complement of
01001100
gives
00111011
+10110100
Now performing the addition we have
00111011
+10110100
11101111
Now, since the lead bit is 1, the answer is interpreted as a negative
number in two's complement form. To find the value we take the two's complement
and assign a minus sign
-00010001 or -17
2.2.8 Binary multiplication
We will briefly consider integer binary multiplication. Multiplication
of two numbers in the binary system is particularly simple. Division is
more complex and we will defer that to more detailed and advanced texts.
Binary multiplication is analogous to the more familiar decimal multiplication.
Consider the multiplication of 13 x 19
19 line 1
x 13 line 2
57 line 3
19 line 4
246 line 5
For a digit in line 2, we multiply the number in line 1 and enter the
result below shifted by the appropriate number of positions. Line 3 is
the result of 3 x 19 and line 4 is the result of 1 x 19 with a shift of
1 position. Line 5 represents the sum of the individual multiplications.
If line 2 had more digits, the sum would have been over more lines.
In binary we perform the same functions, but since the multiplier can
only be a 0 or 1, we simply add a 0 or add the multiplicand shifted by
the appropriate number of positions. (Recall multiplication by one is the
identity operation for multiplication.) The only difference is that we
add after each digits multiplication forming a series of partial sums.
To help clarify the procedure we will consider the same problem in binary.
multiplying partial
bit sum
10011 line 1
x 01101 line 2
10001 1 sum 1 line 3
0 0 line 4
10011 sum 2 line 5
10011 1 line 6
1011111 sum 3 line 7
10011 1 line 8
11110111 sum 4 line 9
0 0 line 10
11110111 sum 5 line 11
Converting 11110111 back to decimal gives the same answer of 247.
To better understand this process we will discuss each line separately;
line 1 multiplicand
line 2 multiplier
line 3 multiplication by the 20 bit with value 1 giving the first partial sum
line 4 multiplication by the 21 bit with value 0 giving zero result
line 5 second partial sum (line 3 + line 4)
line 6 multiplication by the 22 bit with value 1
line 7 third partial sum (line 5 + line 6)
line 8 multiplication by the 23 bit with value 1
line 9 fourth partial sum (line 7 + line 8)
line 10 multiplication by the 24 bit with value 0 giving zero result
line 11 fifth partial sum (line 9 + line 10). This is the final answer
A closer inspection of the binary multiplication process shows that
lines 1, 6 and 8 are all the same bit pattern which is simply the multiplicand
shifted the appropriate number of positions to the left. Lines 4 and 10
are fillers for the zero bits in the multiplier.
The entire binary multiplication process is simply a series of (1) shift
(2) boolean test to see if the multiplier bit is 0 or 1 and (3) add to
partial sum if the bit is 1. As you might expect, this logic is easy to
implement in hardware.
2.2.9 Floating Point Real Numbers
In Section 2.2.2.2 we briefly discussed the representation of numbers
that had a fractional part such as 19.75 and how the familiar decimal point
had a similar counterpart in the binary system called a binary point.
In fact we could represent
19.75
as
10011.11
in the binary system. The digits to the right of the binary point represent
2-1 = 1/2 = 0.5 and 2-2 = 1/4 = 0.25
respectively. These numbers are sometimes referred to as real numbers.
Integer numbers are a special case of reals with no fractional part. In
many applications it is important to be able to represent real numbers
in the computer. Many different schemes are possible, but virtually all
use a concept of "floating" the binary point to a specific position.
It is easiest to understand this concept by example. Consider the numbers
+ 345.75 and - 0.03125
These two numbers could be written in an equivalent form using scientific
notation as
+ 0.34575 x 10+3 and - 0.31250 x 10-1
where we have chosen to have the first nonzero digit to the right of
the decimal point. Another form could be
+ 3.4575 x 10+2 - 3.1250 x 10-2
where we have chosen to have the first nonzero digit to the left of
the decimal point. These two conventions are two examples of normalization"
where we have fixed the representation of the mantissa and floated
the decimal point. The digits
34575 and 31250
are know as the mantissa or fractional part and the powers
of ten, in the first case,
+3 and -1
are known as the characteristic or exponent parts. It
is important to observe that there are four distinct pieces of information
in each number;
1. the sign of the mantissa
2. the digits that make up the mantissa
3. the sign of the characteristic
4. the digits that make up the characteristic.
Several observations should be made. In all systems, a fixed number
of bits are used to represent the real number. The more bits used in the
mantissa, the more accurate the number may be represented. On the other
hand, the more bits used for the exponent or characteristic, the larger
the range of numbers from very small to very large that can be represented.
With a fixed number of bits there is an obvious tradeoff between accuracy
and range.
When we represent these numbers in the computer we must first adopt
a convention. Unlike character data and integers, there is no uniform standard.
Each convention has its advantages and disadvantages and trade offs must
be made when designing the computer. We will briefly look at two different
techniques, one used by IBM mainframes and one IEEE standard which is used
by most PC's and by Hewlett/Packard minicomputers.
2.2.9.1 IBM Mainframe Floating Point Convention
The first convention known as excess (or bias) 64 base 16 is used by
the IBM mainframe series of computers such as the IBM 370 and IBM 4341.
The standard word size is 32 bits and is divided into three parts
- ------- ------------------------
0 1234567 890123456789012345678901
1 2 3
where the bits are numbered from 0 to 31. The first bit (bit 0) represents
the sign of the mantissa. Bits 8 through 31 represent the digits of the
mantissa. Bits 1 through 7 represent both the sign and digits of the characteristic.
A value of 0; (i.e., no shifts in the characteristic) is stored as
100000002 or 2+7 = 64.
If there is a positive shift, it is added to this value and a negative
shift is subtracted. In this manner the characteristic is always stored
as a positive number. This can be important in designing the electronic
circuitry of the arithmetic logic unit.
There is one more convention that is followed as indicated by the term
"base 16". Each value in the characteristic indicates a shift of 4 positions
or
2+4 = 16.
Since the shifts are always in units of four, the normalization is to
place the first nonzero binary digit in one of the first four positions
to the right of the binary point. An equivalent statement is to say that
the first hexadecimal digit to the right of the point is nonzero.
To clarify these ideas we shall consider conversion of the two real
decimal numbers
+345.75 and -0.03125
to excess 64 base 16 notation. Using the techniques discussed earlier
in this chapter, in binary we have
+345.75 = +101011001.11
and
-0.03125 = -0.00001
In the first case the normalization would be
0.0001010100111
where we have shifted 12 positions to the left or 3 units of 4.
In the second case we have
0.1
where we have shifted 4 positions to the right or one unit of four.
For +345.75 the characteristic becomes
1000011 = 67
indicating 3 shifts positive. For -0.03125 the characteristic becomes
0111111 = 63
indicating 1 shift negative.
We are now in a position to combine the different parts giving us
345.75 = 0100 0011 1010 1100 1110 0000 0000 0000
= 4 3 A C E 0 0 016
and
-0.03125 = 1011 1111 1000 0000 0000 0000 0000 0000
= B F 8 0 0 0 0 016
Note that for clarity we grouped the bits in units of four.
2.2.9.2 IEEE Single Precision Floating Point Standard
The IEEE adopted a standard in 1985 for representing real numbers in
computer systems. They have standards for single precision, double precision,
and extended precision numbers.
The Single Precision standard calls for a 32 bit (4 bytes) number. In
this convention, the bits are numbered with the low order bit on the right
as the 0th (zeroth bit) and the left most bit as 31st.
- ------- ------------------------
1 0987654 321098765432109876543210
3 2 1
The first bit (bit 31) represents the sign of the mantissa. Bits 30
through 23 represent the exponent or characteristic. Bits 22 through 0
are used for the mantissa or fractional part. The normalization is to have
the first nonzero bit to the left of the binary point. This bit is not
stored and is referred to as the hidden bit. The bias is
27-1 = 0111 1112 = 12710
Conversion Steps:
The following steps are required to convert from decimal to binary IEEE
notation.
1. Convert the decimal number into positional binary notation.
2. Normalize the binary number to a leading 1 bit.
4. Store the fractional part (significand) in the significand field (bit location 22 to 0).
5. Store the computed exponent in the exponent field (bit locations 30 to 23).
6. Fill in the sign bit using the following convention:
An Example should help to clarify the discussion.
Convert the following number
-26.59375
into IEEE floating point representation.
1. Convert into positional binary notation. (Note: IGNORE SIGN)
26.5937510 => 11010.100112
2. Normalize the binary number to a leading 1 bit.
1.101010011 x 24
3. Compute the binary representation of the exponent using two's complement
notation and add the excess bias of 127.
Excess Value 1000 0011
4. Store the fractional part (significand) in the significand field
(bit location 22 to 0).
5. Store the computed exponent in the exponent field (bit locations
30 to 23).
6. Fill in the sign bit.
0100 0001 1101 0100 1100 0000 0000 0000
4 1 D 4 C 0 0 016
This is the IEEE Single Precision representation of 26.59375.
What happened to the leading 1 bit? As mentioned above, in this representation
scheme it is said to be HIDDEN or IMPLIED. In fact some refer to this method
as the "hidden bit" method. Since we always normalize to a leading
one bit, it is not necessary to store it because "everybody" knows that
it is there. REMEMBER, that when you are decoding a number in this format
you must restore the hidden bit to its proper place.
The tolerance or accuracy is determined by the number of bits used to
represent the fractional part.
As another example we will convert a floating point number stored in
the IEEE format back to decimal. If the number is stored in hex we start
by first
2. Mark off the binary fields:
b. Exponent (8 bits)
c. Significand (23 bits)
4. Replace "hidden bit" and attach the significand.
5. Shift binary point.
6. Convert binary real to decimal real.
7. Place correct sign.
1. Convert hex digits into binary digits.
C0 CC C0 00 hex => 1100 0000 1100 1100 1100 0000 0000 0000
2. Mark off the binary fields:
1 10000001 10011001 10000000 0000000
Sign Exponent Significand
3. Convert exponent field by subtracting 127.
100000012 => 12910
a. convert to decimal and subtract 127
b. add a "negative" 127 (use two's complement)
Bias Value -127 10000001
Power of two 2 00000010 => 2
Replaced Hidden Bit representation.
Shift binary point right two places
110.011001100000000000000
This is the binary real notation.
Convert the whole part:
Positional notation => 2-2 + 2-3 + 2-6
+ 2-7
Decimal value => .25 + 125 + .015625 + .0078125
=> .3984375
This is the decimal value represented by 60 66 60 00 hex.
The HP 3000 series of minicomputers used a similar technique except
the bias was 128 = 27 = 1000 0000.
2.2.10 Range and Accuracy
Obviously, the larger the number of bits, the more information we may
store. This applies to both integer and real numbers. In this section we
will summarize some of the previous statements on the accuracy and range
of binary numbers. We will first consider integers and then reals.
2.2.10.1 The Range and Accuracy of Integers
Using two's compliment we can develop,the following table:
-----------------------------------------------------------------
Number Largest Largest
of bits Positive Negative
Number Number
-----------------------------------------------------------------
8 0111 1111 = 127 1000 0000 = -128
16 0111 1111 1111 1111 = 1000 0000 0000 0000 =
32,768 -32,768
32 0111 1111 1111 1111 1111 1111 1111 1111 =
2,147,483,647
1000 0000 0000 0000 0000 0000 0000 0000 =
-2,147,483,648
-----------------------------------------------------------------
Obviously, the larger the number of bits, the larger the integers that
can be represented. Within the restriction of size, the integers are represented
exactly in the computer.
2.2.10.2 The Range and Accuracy of Reals
When we consider real numbers, we must consider both the characteristic
and mantissa. The characteristic determines the size or range of the numbers.
For the characteristic for the two conventions considered in the previous
sections we have
-----------------------------------------------------------------
convention largest number smallest number
-----------------------------------------------------------------
excess 64 base 16 1111111 = 16+63 0000000 = 16-65
(IBM) approx = 7.2 x 10+75 5.4 x 10-79
excess 127 base 2 1111111 = 2+63 0000000 = 2-65
(IEEE) approx = 9.2 x 10+18 2.7 x 10-20
-----------------------------------------------------------------
The range of numbers becomes more meaningful when we realize that each
power of ten represents a zero to the right or left of the decimal point.
This means that the largest number using the excess 64 convention is
10000000...(75 zeroes)...000
and the smallest number is
0.00000...(79 zeroes)...00001
Certainly, a very large and very small number.
The accuracy, or number of significant digits depends on the number
of bits in the mantissa. Although both conventions have 24 bits in the
mantissa, the excess 64 convention, because of the shifting convention,
may lose three of these giving only 21 bits of accuracy. We thus have the
largest number as excess 64 base 16
2+21-1 = 2,097,151
or approximately 7 decimal digits of accuracy.
For the excess 127 base 2 we have 23 bits but we also have the extra
hidden bit giving us 24 bits of accuracy or
2+24-1 = 16,777,215
or approximately 8 decimal digits.
Other conventions or formats are used. Most computers also have the
facility to use more bits if specified. This is usually called double precision.
The designers then decide whether to increase the range (characteristic)
or accuracy (mantissa) or both. On the IBM family of computers 64 bits
are used with the extra 32 bits used only for the mantissa.
The speed and accuracy of the computer are closely tied to the numeric
representations and built in logic circuits. The more bits used in a representation,
the more logic required to perform a given computation. Most personal computers
perform 16 or 32 bits at a time. Mainframes and supercomputers perform
logic on 32 to over 60 bits at a time. Obviously, the more bits that are
processed at one time, the faster the machine.
The differences may be much more dramatic since the smaller machines
may have to retrieve many bytes from memory one at a time, thus slowing
down the processing even more. There is also another important point when
considering the speed and power of different processors. Most machines
represent real numbers in one of the conventions given or something very
similar. However, many personal computers have only integer logic circuits
in the arithmetic logic unit. In order to perform standard arithmetic operations
such as addition and multiplication on real numbers, they must perform
relatively complex software programs.
On the other hand, larger machines, such as mainframes and supercomputers
have hardware logic that works directly on the real numbers. A special
enhancement for PC's is available, called a floating point processor
that performs these functions. It is possible for a modern supercomputer
to more than a million times faster and more powerful than a typical personal
computer. Our discussion in this section is far from complete. The purpose
has been to acquaint you with the variety of solutions available to store
numeric data and some of the design considerations that are necessary in
developing a computing system.
2.3 Words
In this chapter we have considered the representation of characters
and numbers inside the computer as specific bit patterns. Also, in Chapter
1 we briefly discussed representing instructions and memory addresses as
bit patterns. Thus we see that there are four different quantities that
may be represented by bit patterns; characters, numbers, instructions and
addresses. It is often the case that the same bit pattern may be used to
represent different things. For example the pattern
0100 0001
is the letter "A" in ASCII or the integer 65 in binary. It is important
to realize that a particular bit pattern only takes on meaning within the
context of its usage. With 0100 0001 stored in the computer all we have
is the bit pattern and without the usage context we do not know what it
means.
With the four different types of information stored in the computer's
memory, it is important to be able to access and manipulate an arbitrary
quantity efficiently. In order to accomplish this it is necessary to use
registers in the in the ALU that are of fixed size. This size should be
chosen to be consistent with the number of bits in an instruction and in
an address. The size of the registers also should be consistent with the
representation of both alphanumeric and numeric data.
A related consideration are the number of bits that can simultaneously
be transmitted between memory and the ALU. The transmission path is called
the bus and the number of bits that can be transmitted at one time
is called the bus width.
In most computing systems one set number of bits called a word
is chosen to satisfy these constraints. Typically, a word will consist
of several bytes. The bus width will allow one or an integral number of
words (two, three or four) to be transmitted at one time. Instruction size
will be consistent with word size. In some systems instruction size and
word size will be the same. Sometimes a word will hold an integral number
of instructions or an instruction will require an integral number of words.
The registers in the ALU will also be consistent with the word size. Finally,
the address space will also be consistent with the word size.
Word size used to be used to differentiate between different classes
of computers. Today, as PC's have become more powerful, this use has become
less meaningful. Typical word sizes for PC's are 16 and 32 bits. Mainframes
may have 32 or more bits for a word. Supercomputers often have word sizes
of 60 or more bits.
Instruction sets, address space and data formats are all related to
on word size. For this reason, it is important to that the word size be
chosen at the design time and cannot be changed.
2.4 Parity Checking
Many different techniques are used to insure data integrity. One method
used when transmitting data from one point to another is called parity
checking. A parity, or check bit, is added to each unit transmitted.
This bit is chosen so that there is always either an even number of on
bits (1's) transmitted (even parity) or an odd number of on bits
transmitted (odd parity). After the transmission the number of on
bits is checked, if the parity has changed, the system knows that at least
one bit has changed and can signal an error or take corrective action.
Usually a request for a retransmission is automatically requested and only
after several failures is the actual error message issued.
To see how the parity bits are formed we can refer to Figure 2.11. Consider
the seven bit ASCII representation of the letter C, 1000011. There are
three 1's, 1000011. Odd parity requires an odd number of
1's so the parity bit is a 0 giving an eight bit representation
10000110.
On the other hand, even parity requires an even number of 1's so the
parity bit is a 1 giving an eight bit representation
10000111.
Of course, this technique will not work if two bits are changed. However,
suppose the probability of a bit being changed in transmission is 0.000000001,
then the probability of two bits being changed is 0.000000000000000001.
Very small indeed!
LIST OF FIGURES
Figure 2.1 BCD Format
Figure 2.1 BCD Code Table
Figure 2.3 EBCDIC Code Table
Figure 2.4 ASCII Code Table
Figure 2.5 Counting in Different Bases
Figure 2.6 Packed and Unpacked Decimal
Figure 2.7 Core Dump
Figure 2.8 Powers of Two
Figure 2.9 Addition Table Base 2 and 10
Figure 2.10 Truth Table for Full Binary Addition
Figure 2.11 Parity
Use of this material for educational purposes only is governed by
the definition of Fair Use (Section
107) of the U.S. Copyright Act.
Course material is the property of R. A. Pilgrim
All Rights Reserved