Every push pop takes clock cycles read the opcode one byte

3. Instruction set design

Clearly the design of a new machine is not a smooth process; the designer of the architecture must be aware of the possible hardware limitations when setting up the instruction set, while the hardware designers must be aware of the consequences their decisions have over the software.

For many years the memory in a computer was very expensive; it was only

Virgil Bistriceanu	29	Illinois Institute of Technology

.................

DEC r1 # decrement r1 (in r1 is the loop counter)

The landmark for CISC architectures (Complex Instruction Set Computers) is the VAX family; introduced in 1977, the VAX architecture has more than 200 instructions, some 200 addressing modes and instructions with up to six operands. The instruction set is so powerful that a C program has almost the same number of assembly language instructions as the C source.

The main problem with CISC is that, due to different complexities, instructions require very different number of clock cycles to complete thus making very difficult efficient implementations (like pipelining); long running instructions render the interrupt handling difficult, while uneven sizes of instructions make the instruction decoding inefficient.

As the compiler technology developed, people realized how difficult is to figure out what is the best instruction (sequence of instructions) to be generated by the compiler: simply said there are too many combinations of instructions to do the same job, when using a complex instruction set.

To summarize, remember that the CPU performance is given by:

However the major point is that simple instructions allow pipelined implementations (all instruction execute in the same number of clock cycles) which dramatically decreases the CPI (the ideal CPI in a pipelined implementation is 1); moreover pipelining also permits higher clock rates.

The disadvantage is an increase in the number of instructions (IC); the same program may have as much as twice the number of instructions (assembly language) as compared with the same program that uses a CISC assembly language.

	31	Illinois Institute of Technology

The same program is compiled for a CISC machine (like a VAX) and for a pipelined RISC (like a SPARC based computer). The following data is available: ICCISC = 500000 ICRISC = 1100000 CPICISC = 6.8 CPIRISC = 1.4 Tck CISC = 25 ns (2510 -9 s) Tck RISC = 30 ns What is the relative performance of the two machines? Answer:*
	=		=	850---------462	=	1.83

Consider the following statement in a high level language (C for instance): a = a + b + a * c;

For all familiar with a programming language is clear what the meaning of this statement is: take the value of a and multiply it with c, then add a and c to the above result; the result is assigned to the variable a.

	32	Illinois Institute of Technology

For sake of simplicity we require operations to be carried out in small steps; the result will be obtained after a sequence of simple steps. Incidentally this eliminates the need for the machine to know about the grouping rules.

• how sequencing works: execute the first instruction, then the second and so on;

• what every operation does.

3 Instruction set design

Throughout this lecture we shall use the convention that the destination is the first operand in the instruction. This a commonly used convention though not generally accepted. It is consistent with the assignment statements in high level languages. The other used convention, listing the destination after the source operands, is coherent with our verbal description of operations.

3.2.1 Three-address machines

where:

ADD r2, r1, r0

is to add the value stored in register r1, with the value stored in register r0, and put the result in the register r2.

An instruction must also have a field to specify the operation to be performed, the opcode, probably a few bits depending of the number of instructions you want to have in the instruction set. We end up with an instruction that is 20 to 22 bits wide only for reasons of specifying the operands and the opcode. we shall see there are also other things to be included in a instruction, like an offset, a displacement, or an immediate value, with the possibility of having an even larger instruction size.

Should the machine be a 24 bit machine (24 is the first multiple of eight after 20-22) or not, and the answer is not necessarily:

Virgil Bistriceanu	35	Illinois Institute of Technology

3 Instruction set design


ADD

Example 3.3 shows another way to process operands, when they all reside in the memory. If all instructions specify only memory operands then we say we have a memory-memory machine. We will next explore the implications for hardware of a memory-memory architecture.

In the case of a memory-memory machine the CPU knows it has to get two operands from memory before executing the operation, and has to store the result in memory. There are several ways to specify the address of an operand: this is the topic of addressing modes; for the following example we will assume a very simple way to specify the address of an operand: the absolute address of every operand is given in the instruction (this addressing mode is called absolute).

Example 3.4

only to get the whole instruction. The source operands have to be read from memory and the result has to be stored into memory: this means 3 memory accesses. The instruction takes:

and there are fewer accesses to the memory. The picture however is not complete: to work within registers operands must be brought in from the memory, and write out to the memory, which means extra instructions. On the other hand intermediate results can be kept in registers thus sparing memory accesses. We'll resume this discussion after introducing the addressing modes.

3.2.2 Two-address machines

Example 3.5

In this example two out of three instructions have only two distinct addresses; one of the operands is both a source operand and the destination. This situation occurs rather frequently such that you may think it is worth defining a 2-address machine, as one in which instructions have only two addresses.

The general format of instructions is:

3 Instruction set design

Example 3.6

a = a + b + a * c


MUL r4, r1, r3
ADD r1, r1, r2
ADD r1, r1, r4

The above example has introduced a new two operands instruction:

In this section we discussed about a register-register, 2-address machine. Nothing can stop us thinking about a 2-address, memory-memory machine, or even about a register-memory one.

3.2.3 One address machine (Accumulator machines)

• op is a source or a destination operand. Example of source or destination operand is the accumulator (in the case of a store op denotes the destination).

Thus the meaning of:

	39	Illinois Institute of Technology

Show how to implement the statement


LOAD a
MUL c
ADD b	# a * c + b
ADD a	# a * c + b + a
STO a

A stack is a memory (sometimes called LIFO = Last In First Out) defined by two operations PUSH and POP: PUSH moves a new item from the memory into the stack (you don't have to care where), while POP gets the last item that was pushed into the stack. The formats of operations on a stack machine are:

operation

• op is the address in the main memory where the value to be pushed/popped is located.

A stack machine has two memories: an unstructured one, we call it the main memory, where instructions and data are stored, and a structured one,

Example 3.8

a = a + b + a * c

Whenever an operation is performed, the source operands are popped from the stack, the operation is performed, and the result is pushed into the stack.

Example 3.9

3 Instruction set design

The stack machine has the most compact encoded instructions possible.

3.3 Register or memory?

In classifying architectures we used the number of operands the most common arithmetic instructions specify. As we saw the 3-address and 2-address machines may have operands in registers, memory or both. A

	42	Illinois Institute of Technology

We have two 32 bit machines, a register-register and a memory-memory one. Addresses are 32 bit wide and the register-register machine has 32 general purpose registers. A memory access takes two clock cycles and an arithmetic operation executes in two clock cycles (the execution phase). Addresses of operands are absolute. Show how to implement the statement: a = b * c + a * d;
and compare the number of clock cycles, and the memory traffic for the two machines. Variables a, b, c, d reside in memory and no one may be destroyed but a. Assume also that instructions are 32 bit wide or multiples of 32 bit.

The most valuable advantage of registers is their use in computing expression values and in storing variables. When variables are allocated to registers, the memory traffic is lower, the speed is higher because registers are faster than memory and code length decreases (since a register can be named with fewer bits than a memory location). If the number of registers is sufficient, then local variables will be loaded into registers when the program enters a new scope; the available registers will be used as temporaries, i.e. they will be used for expression evaluation.

How many registers should a machine have? If their number is too small then the compiler will reserve all for expression evaluation and variables will be kept in memory thus decreasing the effectiveness of a register-register machine. A too large number of registers, on the other hand, may mean wasted resources that could be used otherwise for other purposes.

• accumulator architectures: one source operand and the destination are implicitly the accumulator; the second source operand has to be explicitly named

• general purpose register (GPR) architectures have only

	44	Illinois Institute of Technology

• number of operands that may be memory addresses in ALU operation. This number may vary from none to three.

Using the two parameters we have to differentiate GPR architectures, there are seven possible combinations in the table below:

• register-register machines which are also called load-store. They are defined by 2-0 or 3-0 (operands-number of memory addresses); in other words all operands are located in registers

• register-memory machines: defined by 2-1 (operands-number of memory addresses); one of the operands may be a memory address;

	45	Illinois Institute of Technology

register-register machines

Advantages:
• simple instruction encoding (fixed length)
• simple compiler code generation
• instruction may execute in the same number of clock cycles (for a carefully designed instruction set)
• lower memory traffic as compared with other architectures.

memory-memory machines

Advantages:
• compact code
• easy code generation

The instruction set is the collection of operations that define how data is transformed/moved in the machine. An architecture has an unique set of operations and addressing modes that form the instruction set.

The main problems the designer must solve in setting up an instruction set for an architecture are:

We obviously need arithmetic and logic operations as the purpose of most applications is to perform mathematical computation. We also need data transfer operations as data has to be moved from memory to CPU, from CPU to memory, between registers or between memory locations.

The instruction set must provide some instructions for the control flow of programs; we need, in other words, instructions that allow us to construct loops or to skip sections of code if some condition happens, to call subroutines and to return from them.

Virgil Bistriceanu	47	Illinois Institute of Technology

3 Instruction set design

While the IEEE 754 standard is currently adopted by all new designs, there are other formats still in use, like those in the IBM 360/370 or in the DEC's machines.

• add, subtract, multiply, divide: are provided sometimes in an additional instruction set; most of the new architectures have special registers to hold floating point numbers.

Decimal operations

Decimal add, subtraction, multiply, divide are useful for machines running business applications (usually written in COBOL); they are sometimes primitives like in IBM-360 or VAX, but in most cases they are simulated