CHAPTER 2 -- INTERNAL DATA REPRESENTATION

This chapter discusses the internal representation of data. Collections of bits are arranged in standard formats to represent characters and numbers. Access to memory in machine determined units of fixed length called words.

CHAPTER OUTLINE

2.0 Introduction

2.1 Bits and Character Codes

2.1.1 Bytes and Characters

2.1.2 BCD

2.1.3 EBCDIC

2.1.4 ASCII

2.1.5 Other Codes

2.2 Number Systems

2.2.1 Positional Notation and Base 10 - Decimal

2.2.2 Base 2 - Binary

2.2.2.1 Binary Integer Numbers

2.2.2.2 Numeric Data - Packed and Unpacked Decimal

2.2.2.3 Binary Real Numbers

2.2.3 Base 2 Derivatives - Octal and Hexadecimal

2.2.4 Integer Conversions

2.2.4.1 Conversion from Binary to Decimal

2.2.4.2 Conversion from Decimal to Binary

2.2.4.3 Conversion from Binary to Hexadecimal

2.2.4.4 Conversion from Hexadecimal to Binary

2.2.6 Negative Binary Integers

2.2.6.1 Sign - Magnitude Convention

2.2.6.2 Bias or Excess Convention

2.2.6.3 Complements

2.2.7 Binary Subtraction

2.2.8 Binary multiplication

2.2.9 Floating Point Real Numbers

2.2.9.1 IBM Mainframe Floating Point Convention

2.2.9.2 IEEE Floating Point Standard

2.2.10 Range and Accuracy

2.2.10.1 The Range and Accuracy of Integers

2.2.10.2 The Range and Accuracy of Reals

2.3 Words

2.4 Parity

Objectives

When you complete this chapter you will be able to:

1. Discuss methods for storing character and numeric data within a computer and its peripheral devices. 2. Describe and perform binary arithmetic operations. 3. Convert between the number bases 2, 8, 10, 16.

4. Describe the internal binary representation of real numbers by several different conventions.

5. Discuss bus organization.

6. Discuss byte and word concepts.

7. Discuss numeric accurancy in representation and cumputation.

2.0 Introduction

As we move up the evolutionary chain, we find the more advanced the species, the more complex their means of communicating and storing data. A beaver signals danger by slapping its tail on the water. Many scientists think porpoises have complex language structures. Our species has developed the most complex systems of storing data. The hieroglyphics of the ancient Egyptians were a mystery until the key was discovered in the Rosetta stone. The Chinese and Japanese have spoken languages that are as different as French and Spanish, but use essentially the same set of symbols in their written language. The Semitic peoples of the Middle East developed a phonetic alphabet that provided much more versatility than the picture based written languages. There are also different ways that have been devised to represent numeric information such as Roman numerals or the familiar decimal number system.We are used to the Indo-European phonetic alphabet. It is interesting that all languages represent essentially the same ideas and data.

Because of the electronic digital computer, we are now at the onset of what has been called the information age. This necessitates some way of electronically representing the wide range of different ways of storing data. Unique strings of binary 1's and 0's can represent virtually all possible symbols. The question is often asked "Why use binary digits instead of the more natural decimal digits, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9?" Although some early attempts were made to build decimal based computers, such as the IBM 650, it was found, that because of the simplicity of representation the binary system was vastly superior. It is relatively easy to distinguish between a positive or a negative voltage, or between a magnetic pole lined up north-south or south-north, or the presence of a hole or no hole in a punched card. It is also easy to change the state very fast if there are only two possibilities. It is much more difficult, if not technically impossible, to achieve the same speed and accuracy with other number systems. However, there may be something much more fundamental and potentially exciting about using a binary system to represent information. In our quest for extraterrestrial life we are transmitting binary patterns representing numbers. The Voyager space probe also has binary messages to potential intelligent life forms that receive its messages. Perhaps if we ever make contact with other life forms, our first means of communication will be through some sort of binary codes.

In today's complex world of computers and data communications it is helpful to have an understanding of the fundamental building blocks of the information explosion. In Chapter 1 we briefly discussed how strings of bits could represent addresses or locations in memory as well as instructions to be executed by the computer. The picture, however, is not yet complete. A similar coding scheme is used to represent both numeric and character data. In this chapter, we will inspect several of the standard conventions that have been developed to represent character data. We will also discuss the representation of numeric data and how it can be manipulated. The final topic will deal with the issue of insuring data integrity.

The early character code used six bits or binary digits for each symbol. This was sufficient for the digits (0, 1, 2, 3, 4, 5 ,6 ,7 ,8 9,) capital letters (A, B, C,..., X, Y, Z) and a few special symbols. As computing and data processing became more complex it was necessary to increase the number of bits to seven or eight to represent both upper and lower case letters and more special symbols. There are similar concerns with numeric data. The more bits that are used, the more accurate and wider range of numbers that can be represented. This is one of the important differences between today's personal computers and the very large supercomputers. The larger systems can more easily represent and manipulate much larger numbers much more accurately. In this chapter we will first look at the character codes that have been used to represent data. Next we will investigate how numeric data is stored and manipulated in today's modern digital computers. The chapter will conclude with a brief discussion on how data integrity can be improved in the binary system by including some extra information.

2.1 Bits and Character codes

2.1.1 Bytes and Characters

Various schemes have been developed to represent data in the computer and to conveniently transmit it between systems. There is a need to represent both numeric and alphabetic information. There is also a need to represent special characters (e.g., ?/<>{}{} etc.) and control information (e.g., backspace (BS), delete (DEL), new line (NL), etc.). The three codes we examine below have the capability to represent all types of character data. We commonly refer to this data as alphanumeric data. In most computing systems, the hardware has been designed to conveniently store and manipulate one character of data. This unit of storage is usually referred to as one byte.

2.1.2 BCD Characters

The first coding scheme that became widely accepted was called BCD or Binary Coded Decimal. It was the dominant representation used on second generation computing systems. Six bits are used to represent the different characters. As shown in Figure 2.1, the four right most bits (low order bits) are called the numeric bits, while the left two bits (high order bits) are called the zone bits. The zone bits are designated A and B. In Section 2.1.3.1 we shall see how the numeric bits are given their associated decimal value. The check bit, sometimes called parity, is explained in Section 2.4. When the zone bits are 00, the numeric bits are interpreted as the numeric digits 0 through 9. In Section 2.1.3.1, we will see that the code is a direct use of the binary number system. When the zone bits are any of the other three configurations, 01, 10, 11, the interpretation is then a letter or special symbol.

Figure 2.2 shows the entire set of characters and their representation. With the six bit restriction, only 64 different characters are possible. As computing systems became larger and more powerful, there was a need for more characters including the lower case letters and special symbols. This led to the development of the next two codes, EBCDIC and ASCII.

2.1.3 EBCDIC Characters

Extended Binary Coded Decimal Interchange Code or EBCDIC (pronounced Eb-see-dic) uses eight bits to represent each character, thus expanding the possible character configurations to 256. The extra characters are used for lower case letters, non-printing characters that are used to send control information, and other special printing characters. There are still unused codes for future growth. It also uses four numeric bits but has added two zone bits. In an attempt to maintain compatibility with the older BCD code, the meaning of the numeric bits was maintained for the digits and alphabetic letters.

The EBCDIC character codes are shown in Figure 2.3. Several properties are interesting from a data processing perspective. Notice that the only difference between the upper and lower case letters is the second zone bit, so a translation from upper to lower case or vice versa can be accomplished by simply flipping one bit. Notice also that the zone bits can be used to test for special characters, letters or digits. This property is important when testing for the validity of input data. It would be relatively easy to make sure that a user entered a percent as 0.05 or 5% depending on what the computer was expecting.

One additional property is also important. If the eight bits used to represent the letters are interpreted as binary numbers, then the letters follow a sequence with "A" having the lowest value and "Z" having the largest value. This property is critical when sorting data alphabetically. The computer can interpret the bit patterns as binary integers, perform the sort, and then reinterpret the bit patterns as characters. The order used in arranging characters is called the collating sequence and may differ for different character codes.

2.1.4 ASCII

ASCII or the American Standard Code for Information Interchange (pronounced Ass-kee) was developed through the cooperation of the computing and communications industry in an attempt to simplify and standardize machine-to-machine and system-to-system communication. The set consists of letters, digits, special characters, control characters and graphic characters. It is shown in Figure 2.4. ASCII has many of the properties of the EBCDIC code discussed above but there are several important differences. ASCII uses only seven bits. As we will see in Section 5, when we add one bit to help ensure data integrity, all the ASCII characters conveniently fit into one byte. This fact has significant benefits for communication. Another important difference is the sequence of characters. Notice that there are no gaps in the numeric values of the codes. This permits easy checking for data validity. ASCII has become the defacto standard on micro and mini computing systems and the standard for communications. EBCDIC is still found extensively on the large IBM and IBM compatable computing systems. The majority of both large and small systems provide translation utility programs to convert from one system to the other.

2.1.5 Other Codes

The three codes we have discussed so far all have a fixed number of bits. It is also possible to have variable length codes. The advantage of a variable length code permits a significant saving in space. This is obvious if we were to assign

A 0

B 1

C 00

D 01

E 10

F 11

G 000

H 001

...

Since there is a variable length, it is then necessary to have some technique to indicate when one character stops and another starts. An additional saving in space may be realized by assigning the shortest codes to the most frequently occurring characters. One common method is called Hoffman Coding. It should be obvious that there are trade offs in using variable length codes in place of the fixed length methods. The saving in space comes at the cost of increased processing time to decode the variable length characters.

2.2 Number Systems

In Chapter 1 we mentioned how strings of bits could be interpreted as addresses and instructions. In the previous section we saw how strings of bits could be interpreted as characters. In this section we will continue the discussion and show how numeric information may be stored. As we develop the material in this section, it will become clear that the character representation of digits is fundamentally different from a numeric value. A given string of bits can take on different meaning depending on the context of its use.

It was Ada, Countess of Lovelace who first suggested using the binary number system to represent data inside the computer. John von Neumann is credited with reintroducing the idea almost a hundred years later in the early computing machines. In order to understand the number schemes or bases important in computing, we first will review some fundamental concepts using the more familiar base 10 or decimal system.

2.2.1 Positional Notation and Base 10 - Decimal

Certainly the number system we are all most familiar with is decimal or base 10. It is called base 10 because each position in a number has a value ten times as much as the position to its right and there are ten different digits 0,1,2,3,4,5,6,7,8,9. For example, in the number 354, the 4 represents 4 "ones", the 5 represents 5 "tens" and the 3 represents 3 "hundreds". We can see that each digit has two meanings; one is its own numeric value, and the second its position. As we count, when we reach 9, we know the next number starts with 1 in the next column. We say we have carried into the next position. If we are dealing with real numbers with a decimal point the same principle applies, except now as you move to the right of the decimal point you divide by ten for each successive position.

To reinforce these ideas let's look at an example: The number 354.286 can be represented as

3x100 + 5x10 + 4 + 2/10 + 8/100 +6/1000

Using exponential notation this can also be represented as

3x102 + 5x101 + 4x100 + 2x10-1 + 8x10-2 +6x10-3.

As we shall see, using this last form will help us understand other number bases.

2.2.2 Base 2 - Binary

Modern computing systems, as we have discussed, do not use the decimal but rather the binary number system. The only digits in this system are 0 and 1. These two digits are often called bits for Binary digITS. Using the concepts discussed for base ten, we can also count in binary: zero - 0, one - 1; but now when we reach two, since we only have two digits, we carry over to the next position and have 10. This not the number ten in decimal, but the number two in binary. Very often, to avoid ambiguity, a subscript is used, 102. Figure 2.5 shows how to count in several different number systems.

2.2.2.1 Binary Integer Numbers

To better understand these concepts, consider the following number in base two:

1011012

Just as we did in base ten, as we move one position to the left each digit has a value increased by one more multiplication of the base:

1x25 + 0x24 + 1x23 + 1x22 + 0x21 1x20

This can also be written as

1x32 + 0x16 + 1x8 + 1x4 + 0x2 + 1

We thus see that

10110122 = 451010.

An important property of the binary system is its simplicity: because we only have the digits 0 and 1, a position either contributes a power of two or nothing since it is multiplied by 1 or zero. This property simplifies the arithmetic electronic circuits and thus helps make them more reliable.

2.2.2.2 Numeric Data - Packed and Unpacked Decimal

In many applications the accuracy available using fixed length stings of bits for numeric data is not sufficient. (The range and accuracy of binary numbers is discussed in Section 2.2.10.1.) This often happens when we are considering government or large corporate budgets. An alternate scheme may be used to store numbers to an arbitrary degree of accuracy. The formats used are called packed and unpacked decimal.

Frequently, when numeric data is read into the computer in unpacked decimal form. This a simple variant on EBCDIC. From Figure 2.3 we observe that the EBCDIC representation of the digits 0 through 9 consists of the bits 1111 followed by the binary value for that digit. For example, "1" is 11110001 and "9" is 11111001. The left four bits are called the zone portion and the right four bits are the numeric portion. This is shown in Figure 2.6. All the digits in the number are stored in this manner except the right most digit. For the right most digit, the zone portion is used to represent the algebraic sign of the number. A positive sign is 1100 (C16) and a negative sign is 1101 (D16). Thus, the number

+134 becomes 11110001 11110101 11000100

1 3 + 4

and

-97 becomes 11111001 11010111.

9 - 7

From the above discussion, it should be obvious that the information in the zone portion is unnecessary except for the right most digit.

This leads to the packed decimal format where all the zone bits are removed except for the right most digit. In this case the zone or sign is placed in the right most position. If there are not a complete number bytes, then a leading zero is added (see Figure 2.8). To clarify this discussion, consider the packed decimal form of the numbers used in the previous example.

+134 becomes 00010101 01001100

1 3 4 +

and

-97 becomes 00001001 01111101

0 9 7 -

Many computer systems will have hardware designed to operate on numbers in packed decimal directly. The loss in speed is balanced against increased accuracy.

2.2.2.3 Binary Real Numbers

A real number consists of both an integer and fractional part. To represent real numbers in binary, we introduce a concept similar to the decimal point, but now we call it a binary point. The number

101.0122

is interpreted as

1x22 + 0x21 + 1x20 + 0x2-1 + 1x2-2

or

4 + 0 + 1 + 0 + 1/4 = 5.2510

Computing systems may use different conventions to store negative numbers or real numbers but the concepts are all similar.

There are some problems with this simple approach, you may have noticed that there is no way to represent all possible decimal (fractional) numbers using this system. How would you represent the value 0.66; there is no combination of negative powers of two that could be used to represent this value. The best we can do is to form an approximation of this number. We can continue to add ever smaller powers of two until we get close enough. We will have to decide upon some tolerance factor and when we get within that value we will stop.

In general terms, we might represent the value as

dn2n + dn-12n-1 + ... d020 + c12-1 + c22-2 + ... + cm-12m-1 <= X

where X is the decimal number to be represented.

We will have to choose some tolerance to satisfy this inequality. You have probably noticed when using real numbers with a high level language that you never seem to get a zero (0) value when dealing with these numbers. The tolerance is causing the problem. Simply increasing the precision (the number of digits used) will not solve the problem. More precision will simply move the problem to the end of a long line of 0s with a few 1 bits near the end. Most systems pick a fixed number of bits to represent the number. This is set in the hardware and cannot be changed. Actually, several options are usually available given different degrees of accuracy called standard precision, double precision and sometimes extended precision.

We will consider real numbers and negative numbers in more detail in Sections 2.2.6 and 2.2.9.

2.2.3 Base 2 derivatives - Octal and Hexadecimal

Two other number bases are important because of there relationship to base 2. They are base eight or octal and base sixteen or hexadecimal (hex). Counting in these bases is shown in Figure 2.5. In hexadecimal we have 16 digits so we use the ten regular digits and the first 6 letters of the alphabet.

The importance of octal and hex is based on the property that they can serve as convenient abbreviations for binary numbers. From Figure 2.5, we can see that each octal digit can represent three binary digits in a unique manner and each hexadecimal digit can represent four binary digits. Early computers that used the 6 bit BCD code could abbreviate each character using two octal digits. In these machines 6 bits were called a byte.

More modern machines are almost exclusively 8 bit oriented to conveniently handle EBCDIC and ASCII. Each 8 bits can be abbreviated using two hexadecimal digits. The term byte is now commonly accepted to be 8 bits. Since one byte can be used to store one character, the amount of memory is sometimes given as the number of characters. In this case character and byte are synonymous. As shown in Figure 2.5, instead of using 8 bits we can use just two hexadecimal digits.

2.2.4 Integer Conversions

With the four different number systems we have introduced there are a large number of conversions we could consider; however we will restrict our attention only to integer numbers and to the most important systems:

binary to decimal

decimal to binary

If it is necessary to convert between decimal and hexadecimal, it is always possible to pass through binary such as

base 10 <--> base 2 <--> base 16

Sometimes when there are system problems, it is necessary to inspect the contents of memory. A utility program is usually provided with the operating system which will perform the task of "dumping" the contents of memory onto a printer. This is called a core dump or memory dump. The output is usually in hexadecimal to conserve space on the printed page. A typical example is shown in Figure 2.7. To interpret this output, it is often necessary to convert hexadecimal to binary or decimal, or perform the reverse operation. These conversions are not difficult, but as with most mathematical operations, it takes practice to do them rapidly. Most professionals in the information and data processing fields rarely have to make use of these techniques. Consequently, for those few instances when it is necessary, a general acquaintance with the technique is adequate.

2.2.4.1 Conversion from Binary to Decimal

To convert from binary to decimal recall that each position represents a certain power of two. Consider the following base 2 number:

10011011

this can be represented as

1x27+0x26+0x25+1x24+1x23+0x22+1x21+1x20

Using the powers of two from Figure 2.8 we have

128+16+8+2+1=155

2.2.4.2 Conversion from Decimal to Binary

To convert from decimal to binary we use successive subtraction of powers of two. First find the largest power of two that can be subtracted from the decimal number. That becomes the first binary digit on the left or high order bit. As an example, we will use the same number as before, 155. From Figure 2.8, we see that 128 is the largest power of two that can be subtracted from 155 leaving a remainder of 27; hence we have

155 - 128 = 27

or

1x27 + 27

We now repeat the procedure with 27. The largest power of two that can be subtracted from 27 is 16 (24) leaving a remainder of 11.

1x27 + 1x24 + 11

Again, repeating for 11

1x27 + 1x24 + 1x23 + 3

Finally for the last two contributions

1x27 + 1x24 + 1x23 + 1x21 + 1x20

After we have filled in the powers that do not contribute we have

1x27+0x26+0x25+1x24+1x23+0x22+1x21+1x20

Leaving of the powers of two we have

10011011

In tabular form this can be summarized as

number power contribution

of

two

155

-128 27 1

27

26 0

25 0

-16 24 1

11

- 8 23 1

3

22 0

- 2 21 1

1 20 1

Now the binary number can be constructed by reading the third column from bottom to top which gives 10011011.

2.2.4.3 Conversion from Binary to Hexadecimal

Conversion from binary to hexadecimal is quite simple. First check to see if the number of bits is divisible by four. If it is not add the necessary number of 0's in front. Leading zeros do not change the value. For example

0101011111

becomes

00101011111

Now divide the binary number into groups of four bits

0010 1010 1111

We can now use Figure 2.8 to complete the process by substituting for each group of four binary digits one hexadecimal digit. The result is

2 A F

2.2.4.4 Conversion from Hexadecimal to Binary

Conversion from hexadecimal to binary is just as simple. Consider the hexadecimal number

E19C

Using Figure 2.8 this becomes

E 1 9 C = 1110 0001 1001 1100

or

1100000110011100

Most arithmetic logic units support all of the standard integer arithmetic operations directly in hardware. The more powerful machines will also include floating point or real number operations in hardware. Floating point numbers will be discussed in the next section.

Standard mathematics functions include addition, subtraction multiplication and division. Rather than a complete treatment of these operations, we will concentrate on the two most important operations, those of addition and subtraction with a brief discussion of multiplication. When analyzing the details of a memory dump, addition and subtraction are the only necessary operations.

If you were to closely analyze the way you perform addition in the decimal number system, each time you add a column of two numbers you work with three inputs; the first digit, the second digit and the carry from the previous column. For example

76

+59

135

When you are adding the 5 and 7, there is a 1 to carry from the previous column. To find the result, all of us have memorized a 10 by 10 table called the addition table because in decimal each digit can have one of ten different values (see Figure 2.9). The result actually consists of two values; a 3 from the addition, and a 1 to carry into the next column.

The analogous operations are done in binary, except each input can have only two different values, 0 or 1 (see Figure 2.10). If we have three inputs, there are 8 possible combinations. We also have two outputs; the value of the addition for that particular column, and the carry value to the next column. Figure 2.11 is a tabular representation of the different possibilities.

This probably sounds more confusing than it really is. We will consider a few examples to clarify the operations. First consider the problem discussed above:

76

+59

135

In binary this would be

1001100

+0111011

The inputs from the first column are

bit 1 = 0

bit 2 = 1

carry-in bit = 0 (no carry possible in first column)

Therefore, from Figure 2.9, the outputs are

result bit = 1

carry-out bit = 0

Each column is done in the same way with carry-out bit from the previous column becoming the carry-in bit. The final result is

1001100

+0111011

10000111

which is 135 in decimal.

2.2.6 Negative Binary Integers

In the Section 2.2.2 we discussed how positive integers were stored in binary, but no mention was made of negative numbers. Several different techniques are possible. Each is used in different situations in today's computing systems.

2.2.6.1 Sign-Magnitude Convention

Perhaps the most obvious technique for storing negative numbers is to reserve the first bit as a sign bit. A "0" would be positive (+) and a "1" would be negative (-). If we were to consider 8 bit numbers then the number +21 would be

00010101

while the number -21 would be

10010101

Since the lead bit is used to represent the algebraic sign, the largest number in absolute value has seven bits or

1111111

In decimal this is 127. Note that there are two values for zero

10000000 and 00000000

The most common use for the sign magnitude convention is to store the mantissa for floating point numbers which we discuss in Section 2.2.9

2.2.6.2 Bias or Excess Convention

Bias or excess notation uses a simple transformation to convert a range of positive and negative numbers to all positive values. The hardware logic then does not have to consider negative numbers. In this convention a fixed amount is added to all numbers in the range shifting them on the real axis so that they are stored as positive integers. This convention requires the range of integers considered to be symmetric about zero. For example, we could consider the numbers from -128 to +127 where zero is considered a positive number. These could be written in binary as

-128 = -10000000 and +127 = +01111111

If we add 128 or 10000000 to all the numbers the range becomes

0 = 00000000 to +255 = +11111111

The bias is always a power of two and is consistent with the size of integers in a particular hardware configuration. The bias convention is commonly used to represent the characteristic or exponent in floating point reals which is discussed in Section 2.2.9

2.2.6.3 Complements

The final convention we will consider, two's complement, is probably the most commonly use to represent positive and negative integers. Rather than dwell on the mathematical foundations; we will simply look at how to form a two's complement number and some of the important properties.

This technique works because we are dealing with a fixed number of bits to represent the number. If we have a positive integer in binary, to represent its negative counterpart in two's complement, we change all the bits (flip the bits) and add 1. For example, the decimal number 117 can be represented in an 8 bit pattern as

01101010

To find its two's complement we first flip all the bits

10010101

10010101 + 1 = 10010110

This is now interpreted as -117.

The way one determines whether you have a negative number in two's complement is based on the value of the first bit. If it is 0 then the value of the number positive and given by the binary value. If the first bit is 1, the number is negative. To find its absolute value, you take the two's complement and use its binary value.

As another example, consider the number

11001011

Since the first bit is 1, we have a negative number in two's complement. To find its algebraic value we take the two's complement flip bits

00110100

00110101

Then using the techniques we have learned we find the value is

-53

A careful consideration of the previous examples leads to the conclusion that the largest positive number that can be stored in eight bits is

01111111

since, if the first bit was 1, we would interpret it as a negative number in two's complement. This number is

27-1 or 127

and the largest negative number is

-27 = 10000000 or -128.

Thus, using two's complement, the range of integers that can fit in one byte is -128 to +127 or exactly 256 different values. This is consistent with the statement made in Section 2.1.3 in the discussion of EBCDIC, where we stated that there were a possible 256 different characters that could be represented in an eight bit code. It is also true that the complement of the complement is the original number.

2.2.7 Binary Subtraction

Subtraction is similar to addition. We make use of the property that subtraction of a number is the same as the addition of its two's complement. This is one of the most important properties of two's complements. An example should be sufficient to explain the principle. We will assume we have eight bit numbers. Recall that the range is -128 to + 127 and that a number is negative if the first bit is 1. Using the same numbers as the addition example we have

7 76

-59

+17

In binary this is

01001100

-00111011

where we have added a lead 0 to complete the 8 bits.

Forming the two's complement of

00111011

gives

11000101

Using this in the problem leads to a new problem statement

01001100

+11000101

Now using Figure 2.8 and ignoring the last carry gives

01001100

+11000101

00010001

Since the lead bit is zero we interpret the answer as a positive number whose value is +17.

As a final example we will consider what happens when the answer is negative. Reversing the numbers from the previous example we have

59

-76

-17

In binary this is

00111011

-01001100

Forming the two's complement of

01001100

gives

00111011

+10110100

Now performing the addition we have

00111011

+10110100

11101111

Now, since the lead bit is 1, the answer is interpreted as a negative number in two's complement form. To find the value we take the two's complement and assign a minus sign

-00010001 or -17

2.2.8 Binary multiplication

We will briefly consider integer binary multiplication. Multiplication of two numbers in the binary system is particularly simple. Division is more complex and we will defer that to more detailed and advanced texts.

Binary multiplication is analogous to the more familiar decimal multiplication. Consider the multiplication of 13 x 19

19 line 1

x 13 line 2

57 line 3

19 line 4

246 line 5

For a digit in line 2, we multiply the number in line 1 and enter the result below shifted by the appropriate number of positions. Line 3 is the result of 3 x 19 and line 4 is the result of 1 x 19 with a shift of 1 position. Line 5 represents the sum of the individual multiplications. If line 2 had more digits, the sum would have been over more lines.

In binary we perform the same functions, but since the multiplier can only be a 0 or 1, we simply add a 0 or add the multiplicand shifted by the appropriate number of positions. (Recall multiplication by one is the identity operation for multiplication.) The only difference is that we add after each digits multiplication forming a series of partial sums.

To help clarify the procedure we will consider the same problem in binary.

multiplying partial

bit sum

10011 line 1

x 01101 line 2

10001 1 sum 1 line 3

0 0 line 4

10011 sum 2 line 5

10011 1 line 6

1011111 sum 3 line 7

10011 1 line 8

11110111 sum 4 line 9

0 0 line 10

11110111 sum 5 line 11

Converting 11110111 back to decimal gives the same answer of 247.

To better understand this process we will discuss each line separately;

line 1 multiplicand

line 2 multiplier

line 3 multiplication by the 20 bit with value 1 giving the first partial sum

line 4 multiplication by the 21 bit with value 0 giving zero result

line 5 second partial sum (line 3 + line 4)

line 6 multiplication by the 22 bit with value 1

line 7 third partial sum (line 5 + line 6)

line 8 multiplication by the 23 bit with value 1

line 9 fourth partial sum (line 7 + line 8)

line 10 multiplication by the 24 bit with value 0 giving zero result

line 11 fifth partial sum (line 9 + line 10). This is the final answer

A closer inspection of the binary multiplication process shows that lines 1, 6 and 8 are all the same bit pattern which is simply the multiplicand shifted the appropriate number of positions to the left. Lines 4 and 10 are fillers for the zero bits in the multiplier.

The entire binary multiplication process is simply a series of (1) shift (2) boolean test to see if the multiplier bit is 0 or 1 and (3) add to partial sum if the bit is 1. As you might expect, this logic is easy to implement in hardware.

2.2.9 Floating Point Real Numbers

In Section 2.2.2.2 we briefly discussed the representation of numbers that had a fractional part such as 19.75 and how the familiar decimal point had a similar counterpart in the binary system called a binary point. In fact we could represent

19.75

as

10011.11

in the binary system. The digits to the right of the binary point represent

2-1 = 1/2 = 0.5 and 2-2 = 1/4 = 0.25

respectively. These numbers are sometimes referred to as real numbers. Integer numbers are a special case of reals with no fractional part. In many applications it is important to be able to represent real numbers in the computer. Many different schemes are possible, but virtually all use a concept of "floating" the binary point to a specific position.

It is easiest to understand this concept by example. Consider the numbers

+ 345.75 and - 0.03125

These two numbers could be written in an equivalent form using scientific notation as

+ 0.34575 x 10+3 and - 0.31250 x 10-1

where we have chosen to have the first nonzero digit to the right of the decimal point. Another form could be

+ 3.4575 x 10+2 - 3.1250 x 10-2

where we have chosen to have the first nonzero digit to the left of the decimal point. These two conventions are two examples of normalization" where we have fixed the representation of the mantissa and floated the decimal point. The digits

34575 and 31250

are know as the mantissa or fractional part and the powers of ten, in the first case,

+3 and -1

are known as the characteristic or exponent parts. It is important to observe that there are four distinct pieces of information in each number;

1. the sign of the mantissa

2. the digits that make up the mantissa

3. the sign of the characteristic

4. the digits that make up the characteristic.

Several observations should be made. In all systems, a fixed number of bits are used to represent the real number. The more bits used in the mantissa, the more accurate the number may be represented. On the other hand, the more bits used for the exponent or characteristic, the larger the range of numbers from very small to very large that can be represented. With a fixed number of bits there is an obvious tradeoff between accuracy and range.

When we represent these numbers in the computer we must first adopt a convention. Unlike character data and integers, there is no uniform standard. Each convention has its advantages and disadvantages and trade offs must be made when designing the computer. We will briefly look at two different techniques, one used by IBM mainframes and one IEEE standard which is used by most PC's and by Hewlett/Packard minicomputers.

2.2.9.1 IBM Mainframe Floating Point Convention

The first convention known as excess (or bias) 64 base 16 is used by the IBM mainframe series of computers such as the IBM 370 and IBM 4341. The standard word size is 32 bits and is divided into three parts

- ------- ------------------------

0 1234567 890123456789012345678901

1 2 3

where the bits are numbered from 0 to 31. The first bit (bit 0) represents the sign of the mantissa. Bits 8 through 31 represent the digits of the mantissa. Bits 1 through 7 represent both the sign and digits of the characteristic. A value of 0; (i.e., no shifts in the characteristic) is stored as

100000002 or 2+7 = 64.

If there is a positive shift, it is added to this value and a negative shift is subtracted. In this manner the characteristic is always stored as a positive number. This can be important in designing the electronic circuitry of the arithmetic logic unit.

There is one more convention that is followed as indicated by the term "base 16". Each value in the characteristic indicates a shift of 4 positions or

2+4 = 16.

Since the shifts are always in units of four, the normalization is to place the first nonzero binary digit in one of the first four positions to the right of the binary point. An equivalent statement is to say that the first hexadecimal digit to the right of the point is nonzero.

To clarify these ideas we shall consider conversion of the two real decimal numbers

+345.75 and -0.03125

to excess 64 base 16 notation. Using the techniques discussed earlier in this chapter, in binary we have

+345.75 = +101011001.11

and

-0.03125 = -0.00001

In the first case the normalization would be

0.0001010100111

where we have shifted 12 positions to the left or 3 units of 4.

In the second case we have

0.1

where we have shifted 4 positions to the right or one unit of four. For +345.75 the characteristic becomes

1000011 = 67

indicating 3 shifts positive. For -0.03125 the characteristic becomes

0111111 = 63

indicating 1 shift negative.

We are now in a position to combine the different parts giving us

345.75 = 0100 0011 1010 1100 1110 0000 0000 0000

= 4 3 A C E 0 0 016

and

-0.03125 = 1011 1111 1000 0000 0000 0000 0000 0000

= B F 8 0 0 0 0 016

Note that for clarity we grouped the bits in units of four.

2.2.9.2 IEEE Single Precision Floating Point Standard

The IEEE adopted a standard in 1985 for representing real numbers in computer systems. They have standards for single precision, double precision, and extended precision numbers.

The Single Precision standard calls for a 32 bit (4 bytes) number. In this convention, the bits are numbered with the low order bit on the right as the 0th (zeroth bit) and the left most bit as 31st.

- ------- ------------------------

1 0987654 321098765432109876543210

3 2 1

The first bit (bit 31) represents the sign of the mantissa. Bits 30 through 23 represent the exponent or characteristic. Bits 22 through 0 are used for the mantissa or fractional part. The normalization is to have the first nonzero bit to the left of the binary point. This bit is not stored and is referred to as the hidden bit. The bias is

27-1 = 0111 1112 = 12710

Conversion Steps:

The following steps are required to convert from decimal to binary IEEE notation.

1. Convert the decimal number into positional binary notation.

2. Normalize the binary number to a leading 1 bit.

3. Compute the binary representation of the exponent using two's complement notation and add the excess bias of 127.

4. Store the fractional part (significand) in the significand field (bit location 22 to 0).

5. Store the computed exponent in the exponent field (bit locations 30 to 23).

6. Fill in the sign bit using the following convention:

1 => Negative Number
0 => Positive Number

An Example should help to clarify the discussion.

Convert the following number

-26.59375

into IEEE floating point representation.

1. Convert into positional binary notation. (Note: IGNORE SIGN)

26.5937510 => 11010.100112

2. Normalize the binary number to a leading 1 bit.

1.101010011 x 24

3. Compute the binary representation of the exponent using two's complement notation and add the excess bias of 127.

The power is positive 4 In 8 bit binary the value is

0000 0100

Excess Notation: To eliminate the problems representing negative exponents we will use excess (or bias) notation. Simply stated we will add a fixed constant (or BIAS) to the value and store that value. In this format the bias is 127 decimal or 0111 1111 binary.

Exponent Value 0000 0100
Bias Value +0111 1111

Excess Value 1000 0011

4. Store the fractional part (significand) in the significand field (bit location 22 to 0).

5. Store the computed exponent in the exponent field (bit locations 30 to 23).

6. Fill in the sign bit.

The number is positive, therefore, the sign bit is 0.

The final result is

0100 0001 1101 0100 1100 0000 0000 0000

4 1 D 4 C 0 0 016

This is the IEEE Single Precision representation of 26.59375.

What happened to the leading 1 bit? As mentioned above, in this representation scheme it is said to be HIDDEN or IMPLIED. In fact some refer to this method as the "hidden bit" method. Since we always normalize to a leading one bit, it is not necessary to store it because "everybody" knows that it is there. REMEMBER, that when you are decoding a number in this format you must restore the hidden bit to its proper place.

The tolerance or accuracy is determined by the number of bits used to represent the fractional part.

As another example we will convert a floating point number stored in the IEEE format back to decimal. If the number is stored in hex we start by first

1. Convert hex digits into binary digits.

2. Mark off the binary fields:

a. Sign (1 bit)

b. Exponent (8 bits)

c. Significand (23 bits)

3. Convert exponent field by subtracting 127.

4. Replace "hidden bit" and attach the significand.

5. Shift binary point.

6. Convert binary real to decimal real.

7. Place correct sign.

Example convert C0 CC C0 00 hex into its decimal equivalent IEEE Single precision format.

1. Convert hex digits into binary digits.

C0 CC C0 00 hex => 1100 0000 1100 1100 1100 0000 0000 0000

2. Mark off the binary fields:

1 10000001 10011001 10000000 0000000

Sign Exponent Significand

3. Convert exponent field by subtracting 127.

100000012 => 12910

You can do this in two easy ways:

a. convert to decimal and subtract 127

b. add a "negative" 127 (use two's complement)

Exponent Field 129 10000001

Bias Value -127 10000001

Power of two 2 00000010 => 2

4. Replace "hidden bit" and attach the significand.

1.10011001100000000000000

Replaced Hidden Bit representation.

5. Shift binary point.

1.10011001100000000000000 x 22

Shift binary point right two places

110.011001100000000000000

This is the binary real notation.

6. Convert binary real to decimal real.

Convert the whole part:

110 => 6

Convert the fractional part: 011001100000000000000 =>

Positional notation => 2-2 + 2-3 + 2-6 + 2-7

Decimal value => .25 + 125 + .015625 + .0078125

=> .3984375

7. Place correct sign.

Combine parts and place sign

- 6.3984375

This is the decimal value represented by 60 66 60 00 hex.

The HP 3000 series of minicomputers used a similar technique except the bias was 128 = 27 = 1000 0000.

2.2.10 Range and Accuracy

Obviously, the larger the number of bits, the more information we may store. This applies to both integer and real numbers. In this section we will summarize some of the previous statements on the accuracy and range of binary numbers. We will first consider integers and then reals.

2.2.10.1 The Range and Accuracy of Integers

Using two's compliment we can develop,the following table:

-----------------------------------------------------------------

Number Largest Largest

of bits Positive Negative

Number Number

-----------------------------------------------------------------

8 0111 1111 = 127 1000 0000 = -128

16 0111 1111 1111 1111 = 1000 0000 0000 0000 =

32,768 -32,768

32 0111 1111 1111 1111 1111 1111 1111 1111 =

2,147,483,647

1000 0000 0000 0000 0000 0000 0000 0000 =

-2,147,483,648

-----------------------------------------------------------------

Obviously, the larger the number of bits, the larger the integers that can be represented. Within the restriction of size, the integers are represented exactly in the computer.

2.2.10.2 The Range and Accuracy of Reals

When we consider real numbers, we must consider both the characteristic and mantissa. The characteristic determines the size or range of the numbers. For the characteristic for the two conventions considered in the previous sections we have

-----------------------------------------------------------------

convention largest number smallest number

-----------------------------------------------------------------

excess 64 base 16 1111111 = 16+63 0000000 = 16-65

(IBM) approx = 7.2 x 10+75 5.4 x 10-79

excess 127 base 2 1111111 = 2+63 0000000 = 2-65

(IEEE) approx = 9.2 x 10+18 2.7 x 10-20

-----------------------------------------------------------------

The range of numbers becomes more meaningful when we realize that each power of ten represents a zero to the right or left of the decimal point. This means that the largest number using the excess 64 convention is

10000000...(75 zeroes)...000

and the smallest number is

0.00000...(79 zeroes)...00001

Certainly, a very large and very small number.

The accuracy, or number of significant digits depends on the number of bits in the mantissa. Although both conventions have 24 bits in the mantissa, the excess 64 convention, because of the shifting convention, may lose three of these giving only 21 bits of accuracy. We thus have the largest number as excess 64 base 16

2+21-1 = 2,097,151

or approximately 7 decimal digits of accuracy.

For the excess 127 base 2 we have 23 bits but we also have the extra hidden bit giving us 24 bits of accuracy or

2+24-1 = 16,777,215

or approximately 8 decimal digits.

Other conventions or formats are used. Most computers also have the facility to use more bits if specified. This is usually called double precision. The designers then decide whether to increase the range (characteristic) or accuracy (mantissa) or both. On the IBM family of computers 64 bits are used with the extra 32 bits used only for the mantissa.

The speed and accuracy of the computer are closely tied to the numeric representations and built in logic circuits. The more bits used in a representation, the more logic required to perform a given computation. Most personal computers perform 16 or 32 bits at a time. Mainframes and supercomputers perform logic on 32 to over 60 bits at a time. Obviously, the more bits that are processed at one time, the faster the machine.

The differences may be much more dramatic since the smaller machines may have to retrieve many bytes from memory one at a time, thus slowing down the processing even more. There is also another important point when considering the speed and power of different processors. Most machines represent real numbers in one of the conventions given or something very similar. However, many personal computers have only integer logic circuits in the arithmetic logic unit. In order to perform standard arithmetic operations such as addition and multiplication on real numbers, they must perform relatively complex software programs.

On the other hand, larger machines, such as mainframes and supercomputers have hardware logic that works directly on the real numbers. A special enhancement for PC's is available, called a floating point processor that performs these functions. It is possible for a modern supercomputer to more than a million times faster and more powerful than a typical personal computer. Our discussion in this section is far from complete. The purpose has been to acquaint you with the variety of solutions available to store numeric data and some of the design considerations that are necessary in developing a computing system.

2.3 Words

In this chapter we have considered the representation of characters and numbers inside the computer as specific bit patterns. Also, in Chapter 1 we briefly discussed representing instructions and memory addresses as bit patterns. Thus we see that there are four different quantities that may be represented by bit patterns; characters, numbers, instructions and addresses. It is often the case that the same bit pattern may be used to represent different things. For example the pattern

0100 0001

is the letter "A" in ASCII or the integer 65 in binary. It is important to realize that a particular bit pattern only takes on meaning within the context of its usage. With 0100 0001 stored in the computer all we have is the bit pattern and without the usage context we do not know what it means.

With the four different types of information stored in the computer's memory, it is important to be able to access and manipulate an arbitrary quantity efficiently. In order to accomplish this it is necessary to use registers in the in the ALU that are of fixed size. This size should be chosen to be consistent with the number of bits in an instruction and in an address. The size of the registers also should be consistent with the representation of both alphanumeric and numeric data.

A related consideration are the number of bits that can simultaneously be transmitted between memory and the ALU. The transmission path is called the bus and the number of bits that can be transmitted at one time is called the bus width.

In most computing systems one set number of bits called a word is chosen to satisfy these constraints. Typically, a word will consist of several bytes. The bus width will allow one or an integral number of words (two, three or four) to be transmitted at one time. Instruction size will be consistent with word size. In some systems instruction size and word size will be the same. Sometimes a word will hold an integral number of instructions or an instruction will require an integral number of words. The registers in the ALU will also be consistent with the word size. Finally, the address space will also be consistent with the word size.

Word size used to be used to differentiate between different classes of computers. Today, as PC's have become more powerful, this use has become less meaningful. Typical word sizes for PC's are 16 and 32 bits. Mainframes may have 32 or more bits for a word. Supercomputers often have word sizes of 60 or more bits.

Instruction sets, address space and data formats are all related to on word size. For this reason, it is important to that the word size be chosen at the design time and cannot be changed.

2.4 Parity Checking

Many different techniques are used to insure data integrity. One method used when transmitting data from one point to another is called parity checking. A parity, or check bit, is added to each unit transmitted. This bit is chosen so that there is always either an even number of on bits (1's) transmitted (even parity) or an odd number of on bits transmitted (odd parity). After the transmission the number of on bits is checked, if the parity has changed, the system knows that at least one bit has changed and can signal an error or take corrective action. Usually a request for a retransmission is automatically requested and only after several failures is the actual error message issued.

To see how the parity bits are formed we can refer to Figure 2.11. Consider the seven bit ASCII representation of the letter C, 1000011. There are three 1's, 1000011. Odd parity requires an odd number of 1's so the parity bit is a 0 giving an eight bit representation

10000110.

On the other hand, even parity requires an even number of 1's so the parity bit is a 1 giving an eight bit representation

10000111.

Of course, this technique will not work if two bits are changed. However, suppose the probability of a bit being changed in transmission is 0.000000001, then the probability of two bits being changed is 0.000000000000000001. Very small indeed!

LIST OF FIGURES

Figure 2.1 BCD Format

Figure 2.1 BCD Code Table

Figure 2.3 EBCDIC Code Table

Figure 2.4 ASCII Code Table

Figure 2.5 Counting in Different Bases

Figure 2.6 Packed and Unpacked Decimal

Figure 2.7 Core Dump

Figure 2.8 Powers of Two

Figure 2.9 Addition Table Base 2 and 10

Figure 2.10 Truth Table for Full Binary Addition

Figure 2.11 Parity

This course material has been converted to electronic format for use at
Murray State University, and
Dept. of Computer Science and Information Systems

Use of this material for educational purposes only is governed by
the definition of Fair Use (Section 107) of the U.S. Copyright Act.
Course material is the property of  R. A. Pilgrim