logo
+1-617-874-1011 (US)
+61-7-5641-0117 (AU)
+44-117-230-1145 (UK)
Live Chat

MIPS assembly code assignment

Q1 (3 points) : For this problem, we will use the following loop:

for (k=n; k>=0; k--) x[k] = y[k+1] – 2.0 * y[k];

If we assume:

  • R1 contains the address of the nth element of y
  • R2 contains the address of the nth element of x
  • F0 contains 2.0

The above code could be written as the following MIPS assembly code:

Loop: LD

F2,

8(R1)

LD

F4,

0(R1)

MULTD

F6,

F4, F0

SUBD

F8,

F2, F6

SD

F8,

0(R2)

SUBI

R1,

R1, #8

#same as DADDUI R1, R1, #-8

SUBI

R2,

R2, #8

BNEZ

R2,

Loop

Using the following table for instruction latencies

Instruction/Operation Type

Latency in Clock Cycles

Double Load

1

Double Store

0

FP Multiply

5

  • Show the (cycle) schedule, including stalls of the unmodified loop on a fully pipelined machine.

Cycle Instruction/stall

  • Unroll the loop 3 times but do not reschedule the instructions. Ignore the delay slot. Do not delete any instructions other than loop overhead instructions.

Cycle Instruction/stall

  • Unroll the loop 3 times and reschedule the instructions to reduce the number of stalls. Ignore the delay slot. Do not delete any instructions other than loop overhead instructions.

Cycle Instruction/stall

  • What is the speedup of the unrolled loop in (2) from the unmodified case in (1)? What is the speedup of the unrolled and scheduled loop in (3) from the unmodified case in (1). Please show your calculation

Q2 : (7 points) Show scheduling of the following code: L.D F2, 0(R2)

L.D F4, 100(R3)

ADD.D F8, F2, F2

MUL.D F6, F4, F8

SUB.D F6, F2, F4

  1. (3 points) Using scoreboard. Assume one integer ALU, two FP multipliers, one FP adder and one FP divider. Integer ALU takes one execution cycle, FP multipliers take 7 cycles, FP adder takes 4 cycles and FP divider

takes 25 cycles.

  1. (3 points) Using Tomasulo’s algorithm. Assume two

LOAD units, two FP multipliers and three FP adders. Load unit takes one execution cycle for address calculation and a second one for memory access, FP

multipliers take 7 cycles and FP adders take 4 cycles.

  1. (1 point) Comment what structure (Scoreboard or Tomasulo’s) provides shorter execution time and why

(how are the sources of slowdown in one

structure avoided by the “better” structure).

Improve Your Grades with Custom Writing Help
Homework Help
Writing Help
Editing Services
Plagiarism check
Proofreading services
Research Project help
Custom writing services
scanner
E learning blogs

Disclaimer : The study tools and academic assistance/guidance through online tutoring sessions provided by AssignmentHelp.Net is to help and enable students to compete academically. The website does not provide ghostwriting services and has ZERO TOLERANCE towards misuse of the services. In case any user is found misusing our services, the user's account will be immediately terminated.