You are on page 1of 27

2010 R&E Computer System Education & Research

Lecture 9. MIPS Processor Design Single-Cycle Processor Design


Prof. Taeweon Suh Computer Science Education Korea University

Single-Cycle MIPS Processor


Again, microarchitecture (CPU implementation) is divided into 2 interacting parts
Datapath Control

Korea Univ

Single-Cycle Processor Design


Lets start with a memory access instruction - lw
Example: lw $2, 80($0)
I-Type
op
6 bits

STEP 1: Instruction Fetch


CLK PC' PC A RD Instruction Memory Instr CLK A1 A2 A3 WD3 WE3 RD1

rs
5 bits

rt
5 bits

imm
16 bits

CLK WE A RD2 Register File RD Data Memory WD

Korea Univ

Single-Cycle Processor Design


STEP 2: Decoding
Read source operands from register file
I-Type

Example: lw $2, 80($0)

op
6 bits

rs
5 bits

rt
5 bits

imm
16 bits

CLK
25:21

CLK PC Instr A1 A2 A3 WD3 WE3 RD1

CLK WE A RD2 Register File RD Data Memory WD

PC'

RD

Instruction Memory

Korea Univ

Single-Cycle Processor Design


STEP 2: Decoding
Sign-extend the immediate
I-Type

Example: lw $2, 80($0)


CLK PC' PC A RD Instr
25:21

op
6 bits

rs
5 bits

rt
5 bits

imm
16 bits

CLK A1 A2 A3 WD3 WE3 RD1

CLK WE A RD2 Register File RD Data Memory WD

Instruction Memory

15:0

SignImm Sign Extend

module signext(input [15:0] a, output [31:0] y); assign y = {{16{a[15]}}, a}; endmodule

Korea Univ

Single-Cycle Processor Design


STEP 3: Execution
Compute the memory address
I-Type

Example: lw $2, 80($0)

op
6 bits

rs
5 bits

rt
5 bits

imm
16 bits

ALUControl2:0 CLK PC' PC A RD Instr


25:21

CLK A1 A2 A3 WD3 WE3 RD1 RD2 Register File SrcA

010 Zero

CLK WE A RD Data Memory WD

ALU

ALUResult

Instruction Memory

SrcB

SignImm
15:0

Sign Extend

Korea Univ

Single-Cycle Processor Design


STEP 4: Execution
Read data from memory and write it back to register file Example: lw $2, 80($0)
RegWrite 1 CLK PC' PC A RD Instr
25:21

I-Type
op
6 bits

rs
5 bits

rt
5 bits

imm
16 bits

ALUControl2:0 010 CLK SrcA Zero WE A RD Data Memory WD ReadData

CLK A1 A2 A3 WD3 WE3 RD1 RD2 Register File

ALU

ALUResult

Instruction Memory

20:16

SrcB

SignImm
15:0

Sign Extend

Korea Univ

Single-Cycle Processor Design


We are done with lw CPU starts fetching the next instruction from PC+4
module adder(input [31:0] a, b, output [31:0] y); adder assign y = a + b; endmodule
RegWrite 1 CLK PC' PC A RD Instr
25:21

pcadd1(pc, 32'b100, pcplus4);

ALUControl2:0 010 CLK SrcA Zero WE A RD Data Memory WD ReadData

CLK A1 A2
20:16

WE3

ALU

RD1 RD2

ALUResult

Instruction Memory

SrcB

A3 WD3

Register File

PCPlus4 SignImm
15:0

Sign Extend

Result

Korea Univ

Single-Cycle Processor Design


Lets consider another memory access instruction - sw
sw instruction needs to write data to data memory
I-Type

Example: sw $2, 84($0)


RegWrite 0 CLK PC' PC A RD Instr
25:21

op
6 bits

rs
5 bits

rt
5 bits

imm
16 bits

ALUControl2:0 010

MemWrite 1 CLK

CLK A1 A2
20:16

WE3

ALU

RD1 RD2

SrcA

Zero ALUResult A

WE RD Data Memory WD ReadData

Instruction Memory

20:16

SrcB

A3 WD3

Register File

WriteData

PCPlus4 SignImm
15:0

Sign Extend

Result

Korea Univ

Single-Cycle Processor Design


Lets consider arithmetic and logical instructions - add, sub, and, or
Write ALUResult to register file Note that R-type instructions write to rd field of instruction (instead of rt)
RegWrite 1 CLK PC' PC A RD Instr
25:21

R-Type
op
6 bits

rs
5 bits

rt
5 bits

rd
5 bits

shamt
5 bits

funct
6 bits

RegDst 1

ALUSrc ALUControl2:0 0 SrcA varies

MemWrite CLK 0 WE A RD Data Memory WD

MemtoReg 0

CLK A1 A2 A3 WD3
20:16 15:11

WE3

ALU

RD1 RD2

Zero ALUResult

ReadData

0 1

Instruction Memory

20:16

0 SrcB 1

Register File 0 WriteReg4:0 1

WriteData

PCPlus4
15:0

SignImm 4 Sign Extend

Result

10

Korea Univ

Single-Cycle Processor Design


Lets consider a branch instruction - beq
Determine whether register values are equal Calculate branch target address (BTA) from sign-extended immediate and PC+4
Example: beq $4,$0, around
I-Type
op
6 bits

rs
5 bits

rt
5 bits
PCSrc

imm
16 bits

RegWrite 0 CLK 0 1 PC' PC A RD Instr


25:21

RegDst x

ALUSrc ALUControl2:0 Branch 0 SrcA 110 Zero 1

MemWrite CLK 0 WE A RD Data Memory WD

MemtoReg x

CLK A1 A2 A3 WD3
20:16 15:11

WE3

ALU

RD1 RD2

ALUResult

ReadData

0 1

Instruction Memory

20:16

Register File 0 WriteReg4:0 1

0 SrcB 1

WriteData

PCPlus4
15:0

SignImm 4 Sign Extend

<<2

PCBranch

Result

11

Korea Univ

Single-Cycle Datapath Example


We are done with the implementation of basic instructions Lets see how or instruction works out in the implementation
R-Type
op
6 bits

rs
5 bits

rt
5 bits

rd
5 bits

shamt
5 bits

funct
6 bits

31:26 5:0

Control MemWrite Unit Branch ALUControl2:0 Op Funct ALUSrc RegDst RegWrite

MemtoReg

0 PCSrc

0 0 1

CLK PC' PC A RD Instr


25:21

CLK A1 A2 A3 WD3
20:16 15:11

1 WE3 RD1 0 RD2 Register File 0 WriteReg4:0 1 SrcA

CLK 001 Zero ALUResult A

0 WE 0 ReadData 0 1

ALU

Instruction Memory

20:16

0 SrcB 1 1

WriteData

RD Data Memory WD

PCPlus4
15:0

SignImm 4 Sign Extend

<<2

PCBranch

Result

12

Korea Univ

Single-Cycle Processor - Control


As mentioned, CPU is designed with datapath and control Now, lets delve into the control part design
MemtoReg Control MemWrite Unit Branch ALUControl2:0 Op Funct ALUSrc RegDst RegWrite CLK 0 1 PC' PC A RD Instr
25:21

PCSrc

31:26 5:0

CLK A1 A2 A3 WD3
20:16 15:11

CLK WE3 RD1 RD2 Register File 0 WriteReg4:0 1 SrcA Zero WE A RD Data Memory WD ReadData 0 1

ALU

ALUResult

Instruction Memory

20:16

0 SrcB 1

WriteData

PCPlus4
15:0

SignImm 4 Sign Extend

<<2

PCBranch

Result

13

Korea Univ

Control Unit

Control Unit

Opcode5:0

Main Decoder

MemtoReg MemWrite Branch ALUSrc RegDst RegWrite

Opcode and funct fields come from the fetched instruction

ALUOp1:0 ALU Decoder

Funct5:0

ALUControl2:0

14

Korea Univ

ALU Implementation and Control


A
N

B
N

F2:0 000

Function A&B A|B A+B not used A & ~B A | ~B A-B SLT

A N

B N 3F
Cout
Zero Extend

adder

001
F2

010 011 100

ALU
N Y

+ [N-1] S

101 110 111

N = 32 in 32-bit processor

slt: set less than


2

2
N

F1:0

Example: slt $t0, $t1, $t2 // $t0 = 1 if $t1 < $t2

15

Korea Univ

Control Unit: ALU Control


Implementation is completely dependent on hardware designers But, the designers should make sure the implementation is reasonable enough
Memory access instructions (lw, sw) need to use ALU to calculate memory target address (addition) Branch instructions (beq, bne) need to use ALU for the equality check (subtraction)

ALUOp1:0 00

Meaning Add

01
10 11

Subtract
Look at Funct Not Used ALUControl2:0 010 (Add) 110 (Subtract) 010 (Add)

Control Unit

Opcode5:0

Main Decoder

MemtoReg MemWrite Branch ALUSrc RegDst RegWrite

ALUOp1:0 00 X1 1X

Funct X X 100000 (add)

1X
ALUOp1:0

100010 (sub)
100100 (and) 100101 (or) 101010 (slt)
16

110 (Subtract)
000 (And) 001 (Or) 111 (SLT)

1X
Funct5:0 ALU Decoder ALUControl2:0

1X 1X

Korea Univ

Control Unit: Main Decoder


Instruction

Op5:0
000000 100011 101011 000100

RegWrite

RegDst

AluSrc

Branch

MemWrite

MemtoReg

ALUOp1:0

R-type lw sw beq

1 1 0

1 0 X X

0 1 1

0 0

0 0 1 0

0 1 X X

10 00 00 01

0
1

Control Unit

Opcode5:0

Main Decoder

MemtoReg MemWrite Branch ALUSrc RegDst RegWrite

ALUOp1:0 00 01 10 11

Meaning Add Subtract Look at Funct field Not Used

ALUOp1:0 ALU Decoder

Funct5:0

ALUControl2:0

17

Korea Univ

How about Other Instructions?


Hmmm.. Now, we are done with the control part design Lets examine if the design is able to execute other instructions
addi
Example: addi $t0, $t1, -14
MemtoReg Control MemWrite Unit Branch ALUControl2:0 Op Funct ALUSrc RegDst RegWrite CLK 0 1 PC' PC A RD Instr
25:21

PCSrc

31:26 5:0

CLK A1 A2 A3 WD3
20:16 15:11

CLK WE3 RD1 RD2 Register File 0 WriteReg4:0 1 SrcA Zero WE A RD Data Memory WD ReadData 0 1

ALU

ALUResult

Instruction Memory

20:16

0 SrcB 1

WriteData

PCPlus4
15:0

SignImm 4 Sign Extend

<<2

PCBranch

Result

18

Korea Univ

Control Unit: Main Decoder


Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0

R-type lw sw beq addi

000000 100011

1 1

1 0

0 1

0 0

0 0

0 1

10 00

101011
000100 001000

0
0 1

X
X 0

1
0 1

0
1 0

1
0 0

X
X

00
01 00

19

Korea Univ

How about Other Instructions?


Ok. So far, so good How about jump instructions?
j
MemtoReg Control MemWrite Unit Branch ALUControl2:0 Op Funct ALUSrc RegDst RegWrite CLK 0 1 PC' PC A RD Instr
25:21

J-Type
op
6 bits

addr
26 bits

PCSrc

31:26 5:0

CLK A1 A2 A3 WD3
20:16 15:11

CLK WE3 RD1 RD2 Register File 0 WriteReg4:0 1 SrcA Zero WE A RD Data Memory WD ReadData 0 1

ALU

ALUResult

Instruction Memory

20:16

0 SrcB 1

WriteData

PCPlus4
15:0

SignImm 4 Sign Extend

<<2

PCBranch

Result

20

Korea Univ

How about Other Instructions?


We need to add some hardware to support the j instruction A logic to compute the target address op Mux and control signal 6 bits
Jump MemtoReg Control MemWrite Unit Branch ALUControl2:0 Op Funct ALUSrc RegDst RegWrite CLK 0 1 0 1 PC' PC A RD Instr
25:21

J-Type
addr
26 bits

PCSrc

31:26 5:0

CLK A1 A2 A3 WD3
20:16

CLK WE3 RD1 RD2 Register File 0 WriteReg4:0 1 SrcA Zero WE A RD Data Memory WD ReadData 0 Result 1

ALU

ALUResult

Instruction Memory

20:16

0 SrcB 1

WriteData

PCJump

15:11

PCPlus4
15:0

SignImm 4
27:0 31:28

Sign Extend

<<2

25:0

<<2

PCBranch

21

Korea Univ

Control Unit: Main Decoder


There is one more output in the main decoder to support the jump instructions Jump
Instruction Op5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp1:0 Jump

R-type

000000

10

lw
sw beq addi j

100011
101011 000100 001000 000100

1
0 0 1 0

0
X X 0 X

1
1 0 1 X

0
0 1 0 X

0
1 0 0 0

1
X X 0 X

00
00 01 00 XX

0
0 0 0 1

22

Korea Univ

Verilog Code - Main Decoder and ALU Control


Control Unit Opcode5:0

module maindec(input [5:0] op, output memtoreg, memwrite, output branch, alusrc, output regdst, regwrite, output jump, output [1:0] aluop); reg [8:0] controls; assign {regwrite, regdst, alusrc, branch, memwrite, memtoreg, jump, aluop} = controls; always @(*) case(op) 6'b000000: 6'b100011: 6'b101011: 6'b000100: 6'b001000: 6'b000010: default: endcase endmodule

Main Decoder

MemtoReg MemWrite Branch ALUSrc RegDst RegWrite

ALUOp1:0 ALU Decoder

Funct5:0

ALUControl2:0

module aludec(input [5:0] funct, input [1:0] aluop, output reg [2:0] alucontrol); always @(*) case(aluop) 2'b00: alucontrol <= 3'b010; // add 2'b01: alucontrol <= 3'b110; // sub default: case(funct) // RTYPE 6'b100000: alucontrol <= 3'b010; 6'b100010: alucontrol <= 3'b110; 6'b100100: alucontrol <= 3'b000; 6'b100101: alucontrol <= 3'b001; 6'b101010: alucontrol <= 3'b111; default: alucontrol <= 3'bxxx; // endcase endcase endmodule

controls <= controls <= controls <= controls <= controls <= controls <= controls <=

9'b110000010; // R-type 9'b101001000; // lw 9'b001010000; // sw 9'b000100001; // beq 9'b101000000; // addi 9'b000000100; // j 9'bxxxxxxxxx; // ???

// ADD // SUB // AND // OR // SLT ???

23

Korea Univ

Verilog Code ALU


A N B N 3F

ALU
N Y
A
N

module alu(input [31:0] a, b, input [2:0] alucont, output reg [31:0] result, output zero); wire [31:0] b2, sum, slt; assign b2 = alucont[2] ? ~b:b; assign sum = a + b2 + alucont[2]; assign slt = sum[31];
F2

F2:0
000

Function
A&B

B
N

001
010 011 100 101 110 111

A|B
A+B not used A & ~B A | ~B A-B SLT

Cout
Zero Extend

+ [N-1] S

always@(*) case(alucont[1:0]) 2'b00: result <= a & b2; 2'b01: result <= a | b2; 2'b10: result <= sum; 2'b11: result <= slt; endcase assign zero = (result == 32'b0); endmodule

1
N N

0
N N

2
N

F1:0

24

Korea Univ

Single-Cycle Processor Performance


How fast is the single-cycle processor? Clock cycle time (frequency) is limited by the critical path
The critical path is the path that takes the longest time What do you think the critical path is?
The path that lw instruction goes through
MemtoReg Control MemWrite Unit Branch ALUControl 2:0 Op ALUSrc Funct RegDst RegWrite CLK 0 1 PC' PC A RD Instr
25:21

PCSrc

31:26 5:0

CLK A1 A2 A3 WD3
20:16 15:11

1 WE3 RD1 1 RD2 Register File 0 WriteReg4:0 1 SrcA

CLK 010 Zero ALUResult A

0 WE 1 ReadData 0 1

ALU

Instruction Memory

20:16

0 SrcB 1 0

WriteData

RD Data Memory WD

PCPlus4
15:0

SignImm 4 Sign Extend

<<2

PCBranch

Result

25

Korea Univ

Single-Cycle Processor Performance


Single-cycle critical path:
Tc = tpcq_PC + tmem + max(tRFread, tsext) + tmux + tALU + tmem + tmux + tRFsetup

In most implementations, limiting paths are: memory (instruction and data), ALU, register file. Thus,
Tc = tpcq_PC + 2tmem + tRFread + 2tmux + tALU + tRFsetup

31:26 5:0

MemtoReg Control MemWrite Unit Branch ALUControl 2:0 Op ALUSrc RegWrite Funct RegDst

PCSrc

Elements Register clock-to-Q

Parameter tpcq_PC tmux tALU tmem tRFread tRFsetup

CLK 0 1 PC' PC A RD Instr


25:21

CLK A1 A2 A3

1 WE3 RD1 RD2 SrcA

CLK 010 Zero ALUResult A

0 WE 1 ReadData 0 1

Multiplexer ALU Memory read Register file read

Instruction Memory

20:16

Register WD3 File


20:16 15:11

1 0 SrcB 1 0 0 1

ALU

WriteData

RD Data Memory WD

PCPlus4
15:0

WriteReg4:0

SignImm 4 Sign Extend

<<2

PCBranch

Register file setup


Result

26

Korea Univ

Single-Cycle Processor Performance Example


Elements Register clock-to-Q Multiplexer ALU Memory read Register file read Register file setup Parameter tpcq_PC tmux tALU tmem tRFread tRFsetup Delay (ps) 30 25 200 250 150 20

Tc = tpcq_PC + 2tmem + tRFread + 2tmux + tALU + tRFsetup


= [30 + 2(250) + 150 + 2(25) + 200 + 20] ps = 950 ps

fc = 1/Tc fc = 1/950ps
= 1.052GHz

Assuming that the CPU executes 100 billion instructions to run your program, what is the execution time of the program on a single-cycle MIPS processor?
Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = (100 109)(1)(950 10-12 s) = 95 seconds

27

Korea Univ

You might also like