You are on page 1of 3

CS2100 (AY2013/2014 Semester 2) Assignment #4 Answers

ANSWERS For tutors only! Do not reveal to students.

You are to do this assignment on your own. (Students found copying will be penalised.) Fill in your name and tutorial group in the box above, and your answers in the space indicated below. Working is not required. Submit this assignment before 17 April 2014, Thursday, 5pm into the IVLE workbin. Late submission will not be accepted. Please name your file that includes your matriculation number (eg: A0071234X.doc or A0071234X.pdf).

1. [15 marks] Fill in the timing charts below with the given processor characteristics. Each instruction is worth 1 mark. No partial marking, i.e. you need to get all stages for an instruction correct. The timing charts are independent from one another. Parts (b), (c), and (d) assume full data forwarding paths. a. All data forwarding paths are implemented for RAW data hazards.
add $1, $2, $3 sub $2, $3, $1 lw $1, 16($2) sub $2, $3, $1

D F D F

E D E D

M E M STL

W M W E

b. The early branching is implemented. Note: BTA refers to the instruction at the branch target address.
beq $1, $2, there <BTA> lw $3, 20($1) beq $1, $2, there <BTA> add $1, $2, $3 beq $1, $2, there <BTA>

D STL D F

E F E D STL E STL

M D M E F M D STL

W E W M D W E F

W E

D F

M D

W E

c. Branch-Not-Taken prediction is used, but there is no early branching. Suppose the branch is actually taken, show the remaining of the timing chart. You will need to fill in the right instruction(s) as well for this part. Use B+1, B+2 etc. to refer to instructions after the branch in program order. Use BTA to refer to instructions at the branch target. Use FLS to indicate pipeline flushing.
beq $1, $2, there B+1 B+2 B+3 BTA

D F

E D F

M E D F

W FLS FLS FLS F

CS2100 (AY2013/2014 Semester 2)

- 1 of 3 -

Assignment #4 Answers

d. Delayed branch is used, but there is no early branching. Supposed the branch is taken, show the remaining of the timing chart. Similar to part (c), youll need to fill in the rest of the instructions using the same notations. You only need to show up to the BTA instruction.
beq $1, $2, there B+1 B+2 B+3 BTA

D F

E D F

M E D F

W M E D F

W M E D

W M E

W M

2.

[5 marks] (Adapted from AY11/12 Exam) In computer, colour can be represented using the CMYK model. In this model, a colour is represented by 4 values representing the saturation of the four principle colours: Cyan, Magenta, Yellow and Black. Suppose we stored the CMYK values of 16 colours as separate 32-bit integers in an array of size 64. So, the first four array elements (A[03]) represent the CMYK values for the first colour, the second set of four array elements (A[4.7]) represents the second colour, etc, as illustrated below:
A[0] Cyan value A[1] Magenta value Colour 1 A[2] Yellow value A[3] Black value A[4] Cyan value A[5] Magenta value .

Consider the following 2 code fragments X and Y in some C-like high level programming language: Code X:
//Each "int" is 32-bit int A[64] = { ......... }; //Cyan values, A[0], A[4], ... for (i = 0; i < 64; i = i + 4) Change A[i] //Magenta values, A[1], A[5], ... for (i = 1; i < 64; i = i + 4) Change A[i] //Yellow values, A[2], A[6], ... for (i = 2; i < 64; i = i + 4) Change A[i] //Black values, A[3], A[7], ... for (i = 3; i < 64; i = i + 4) Change A[i] }

Code Y:
//Each "int" is 32-bit int A[64] = { ......... }; //Go through 16 colours for (i = 0; i < 16; i = i + 1) { for (j = 0; j < 4; j = j + 1) { //Go through the 4 values CMYK Change A[i*4+j] }

CS2100 (AY2013/2014 Semester 2)

- 2 of 3 -

Assignment #4 Answers

For simplicity, the base of array A is assumed to be in memory location 0x0. You may also ignore the impact of variable i on cache access in the following questions.

a. (Code X) Given a tiny direct mapped cache with 2 blocks of 8 bytes each. Give a tally of the following information: The number of cold/compulsory cache misses, and the number of conflict misses. Cold Misses = ____32__________ Conflict Misses = _______32______ b. (Code Y) Given a tiny direct mapped cache with 2 blocks of 8 bytes each. Give a tally of the following information: The number of cold/compulsory cache misses, and the number of conflict misses. Cold Misses = ______32________ Conflict Misses = ______0_______ c. Does a 2-way set associative cache with 4 blocks of 8 bytes each improve the performance of (a) or (b)? Why? No. The increase in associativity can help to reduce conflict miss only if we reuse the block in time. For code X, a block is reused only after a long cycle (e.g. block 0 is reused only after all even blocks from 2 to 30 are used). By then, block 0 would have been evicted long ago. For code Y, the blocks are not reused at all. So, the additional associativity is not useful.

CS2100 (AY2013/2014 Semester 2)

- 3 of 3 -

Assignment #4 Answers

You might also like