Cse231 - Midterm 1 - With Model Answers

The American University in Cairo
Department of Computer Science and Engineering
EENGR352 /CSCE 231
Computer Organization and Assembly Programming
First Midterm ‐ Fall 2010

Name:_______________________________________________
ID: __________________________________

Tips/Instructions:
‐ Total exam time is 70 minutes.
‐ No external aids are allowed; these include: friends, notes, books, text messages, or the
Internet.
‐ Some problems can be solved in multiple ways. Only one is required.
‐ Unclear answers or unclear hand writing will reduce your chances.
‐ Do your best! You will be graded for your understanding and the way you think of a problem.
Best of Luck!
October 13, 2010
CSCE 231 / EENGR352 – First Midterm

Question 1 (5 Points)
Just like we defined MIPS rating, we can also define something called the MFLOPS rating which stands
for Millions of Floating Point operations per Second. If Machine A has a higher MIPS rating than that of
Machine B, then does Machine A necessarily have a higher MFLOPS rating in comparison to Machine B?
ANSWER
A higher MIPS rating for machine A compared to machine B need not imply a higher MFLOPS rating for
that machine A. One reason for this can be the following: It is possible that the floating point
instructions form a fairly low proportion of the all the instructions in a given program. So if the floating
point operations of machine B are far more efficient than the floating point operations of machine A
while the other (integer, memory etc) instructions are more efficient on B, then machine B gets a higher
MFLOPS rating than A while A has a higher MIPS rating.

Page 2 of 7

October 13, 2010

Consider two different implementations, M1 and M2, of the same instruction set. There are three
classes of instructions (A, B, and C) in the instruction set. M1 has a clock rate of 80 MHz and M2 has a
clock rate of 100 MHz. The average number of cycles for each instruction class and their frequencies (for
a typical program) are as follows:
Instruction Class Machine M1 –

Instruction Class Machine M1 – Machine M2 – Frequency
Cycles/Instruction Class Cycles/Instruction Class
A 1 2 60%
B 2 3 30%
C 4 4 10%
Machine M2 –
(a) Calculate the average CPI for each machine, M1, and M2.
ANSWER
For Machine M1: Clocks per Instruction = (60/100)* 1 + (30/100)*2 + (10/100)*4 = 1.6
For Machine M2: Clocks per Instruction = (60/100)*2 + (30/100)*3 + (10/100)*4 = 2.5

(b) Calculate the average MIPS ratings for each machine, M1 and M2.

ANSWER
For Machine M1: Average MIPS rating = Clock Rate/(CPI * 106) = (80 * 106) / (1.6*106) = 50.0
For Machine M2: Average MIPS rating = Clock Rate/(CPI * 106) = (100 * 106) / (2.5*106) = 40.0

(c) Which machine has a smaller MIPS rating ? Which individual instruction class CPI do you need to
change, and by how much, to have this machine have the same or better performance as the
machine with the higher MIPS rating (you can only change the CPI for one of the instruction classes
on the slower machine)?
ANSWER
Machine M2 has a smaller MIPS rating. If we change the CPI of instruction class A for Machine M2 to 1,
we can have a better MIPS rating than M1 as follows:
Clocks per Instruction = (60/100)*1 + (30/100)*3 + (10/100)*4 = 1.9
Average MIPS rating = Clock Rate/(CPI * 106) = (100 * 106) / (1.9*106) = 52.6
Page 3 of 7

October 13, 2010

(Amdahl’s law question) Suppose you have a machine which executes a program consisting of 50%
floating point multiply, 20% floating point divide, and the remaining 30% are from other instructions.
(a) Management wants the machine to run 4 times faster. You can make the divide run at most 3 times
faster and the multiply run at most 8 times faster. Can you meet management’s goal by making only
one improvement, and which one?

ANSWER
Amdahl’s Law states: Execution time after improvement = (Execution time affected by
improvement)/(Amount of Improvement) + Execution time unaffected
Assuming initially that the floating point multiply, floating point divide and the other instructions had
the same CPI, Execution time after Improvement with Divide = (20)/3 + (50 + 30) = 86.67
Execution time after Improvement with Multiply = (50)/8 + (20 + 30) = 66.67
The management’s goal cannot be met by making the improvement with Multiply alone.

(b) Dogbert has now taken over the company removing all the previous managers. If you make both the
multiply and divide improvements, what is the speed of the improved machine relative to the
original machine?
ANSWER
If we make both the improvements,
Execution time after Improvement = (50)/8 + (20)/3 + (30) = 53.33
The speedup relative to the original machine = (100)/(53.33) = 1.88

Page 4 of 7

October 13, 2010

In MIPS assembly, write an assembly language version of the following C code segment:
int A[100], B[100];

for (i=1; i < 100; i++) {
A[i] = A[i-1] + B[i];
}

At the beginning of this code segment, the only values in registers are the base address of arrays A and B
in registers $a0 and $a1. Avoid the use of multiplication instructions–they are unnecessary.
ANSWER
The MIPS assembly sequence is as follows:
li $t0, 1 # Starting index of i

li $t5, 100 # Loop bound
loop:
lw $t1, 0($a1) # Load A[i-1]
lw $t2, 4($a2) # Load B[i]
add $t3, $t1, $t2 # A[i-1] + B[i]
sw $t3, 4($a1) # A[i] = A[i-1] + B[i]
addi $a1, 4 # Go to i+1
addi $a2, 4 # Go to i+1
addi $t0, 1 # Increment index variable
bne $t0, $t5, loop # Compare with Loop Bound
halt:

Page 5 of 7

October 13, 2010

Suppose that a new MIPS instruction, called bcp, was designed to copy a block of words from one
address to another. Assume that this instruction requires that the starting address of the source block
be in register $t1 and that the destination address be in $t2. The instruction also requires that the
number of words to copy be in $t3 (which is > 0). Furthermore, assume that the values of these registers
as well as register $t4 can be destroyed in executing this instruction (so that the registers can be used as
temporaries to execute the instruction).
Do the following: Write the MIPS assembly code to implement a block copy without this instruction.
Write the MIPS assembly code to implement a block copy with this instruction. Estimate the total cycles
necessary for each realization to copy 100‐words on the multicycle machine (assuming that any
instruction will take only one cycle to execute).
ANSWER
The MIPS code to implement block copy without the bcp instruction is as follows:
loop:
lw $t4, 0($t1)
sw $t4, 0($t2)
addi $t1, $t1, 4
addi $t2, $t2, 4
subi $t3, $t3, 1
bne $t3, $zero, loop
To implement block copy with this instruction:
li $t1, src
li $t2, dst
li $t3, count
bcp

Assuming each instruction in the MIPS code of ‘loop’ takes 1 cycle, for doing a 100‐word copy the total
number of cycles taken is 6*100 = 600 cycles.

Page 6 of 7

October 13, 2010

Question 6 (BONUS – 7 Points)
In MIPS assembly, write an assembly language version of the following C code segment:
for (i = 0; i < 98; i ++) {

C[i] = A[i + 1] - A[i] * B[i + 2]
}

Arrays A, B and C start at memory location A000hex, B000hex and C000hex respectively. Try to reduce

the total number of instructions and the number of expensive instructions such as multiplies.
ANSWER
The MIPS assembly sequence is as follows:
li $s0, 0xA000 # Load Address of A

li $s1, 0xB000 # Load Address of B
li $s2, 0xC000 # Load Address of C
li $t0, 0 # Starting index of i
li $t5, 98 # Loop bound
loop:
lw $t1, 0($s1) # Load A[i]
lw $t2, 8($s2) # Load B[i+2]
mul $t3, $t1, $t2 # A[i] * B[i+2]
lw $t1, 4($s1) # Load A[i+1]
add $t2, $t1, $t3 # A[i+1] + A[i]*B[i+2]
sw $t2, 4($s3) # C[i] = A[i+1] + A[i]*B[i+2]
addi $s1, 4 # Go to A[i+1]
addi $s2, 4 # Go to B[i+1]
addi $s3, 4 # Go to C[i+1]
addi $t0, 1 # Increment index variable
bne $t0, $t5, loop # Compare with Loop Bound
halt:
Page 7 of 7

Cse231 - Midterm 1 - With Model Answers

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cse231 - Midterm 1 - With Model Answers

Uploaded by

Copyright:

Available Formats

The American University in Cairo

Instruction Class Machine M1 –

int A[100], B[100];

li $t0, 1 # Starting index of i

for (i = 0; i < 98; i ++) {

Arrays A, B and C start at memory location A000hex, B000hex and C000hex respectively. Try to reduce

li $s0, 0xA000 # Load Address of A

You might also like