Professional Documents
Culture Documents
ECE 361: Final Project (Fall 2002) Northwestern University, Evanston, IL (9th December)
Contents
Schematic and design of the 32-bit ALU Schematic and design of the Register File Schematic and design of the primary and the ALU Control Unit Final schematic and working of the CPU Sample programs and results of simulation
Another module associated with the 32-bit ALU is the SLL unit. The SLL unit is a logical left shifter unit, which takes in a 32-bit operand and shifts it by a 5-bit amount, which is the 2nd input to the module. The output is the 32-bit shifted value. Depending on the control signal, the final output of the ALU, is either from the cascaded 32-bits of basic ALU or from the SLL unit. A Zero Detect unit detects if the final output value is 0 and accordingly asserts the Zero output bit. 32bit MUXES (multiplexers) have been designed and incorporated for the purpose. These muxes take in two 32-bit values and choose one of them for the output depending on the control signal to it. Fig-2, Fig-3 and Fig-4 show the schematic of the final 32-bit ALU, the cascaded ALUs, and the SLL Unit respectively.
The cascaded 32-bit ALU module
SLL Unit
Incorporating the write port is done using a decoder that selects the desired register to write. It is to be noted that all writes are done on the rising clock edge, so the global clock is to be fed in at the final CPU design stage. Thus for a case of read and write to the same register in the same cycle, there is no conflict since all writes happen after the first half of the clock cycle. Fig 6 (below) shows the schematic of the register file implemented.
1 0 0 0 0 0
0 1 1 0 0 1
0 1 0 0 0 0
1 1 0 0 0 1
0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 1 0
0 0 0 0 1 0
1 0 0 0 0 0
0 0 0 1 1 0
Fig 7
(B) ALU CONTROL UNIT DESIGNThe central control unit sends a 2-bit input to the ALU controller, which in turn needs to send the appropriate control signals to the ALU. Besides the 2-bit input from the central controller, the ALU control unit also needs the 6-bits of the funct field that needs to be decoded for the right ALU control signal in case of Rtype instructions. Given these inputs, the control unit generates the following ALU signals as given in the table below. ALU_OP[1:0] is the input from the central control unit, and ALU[2:0] show the output control signal for the ALU. The schematic of the implementation is shown in Fig 8.
Instruction Rtype (add) Rtype (sub) Rtype (and) Rtype (or) Rtype (sll) Rtype (slt) lw / sw / addi beq / bne Funct[5:0] 100000 100010 100100 100101 000000 101010 X X ALU_OP[1:0] 10 10 10 10 10 10 00 01 Desired ALU Operation ADD SUB AND OR SLL SLT ADD SUB ALU[2:0] 010 011 000 100 110 111 010 011
Fig 8
Summary: The CPU was designed and tested for different instructions, and also for 3 programs, namely: Sort Program sort_corrected_branch.dat Summation Program sum_branch.dat Simple Transaction Simulator bills_branch.dat The program executions were tested for verity, and the CPU was found to be working without any errors for all the test cases seen so far. Figures 10, 11, and 12 show traces of simulations of the above programs. The cycle-time for the tests was set to 100ns (Clock Rate = 10Mhz). The simulations were found to work correctly for cycle-times as low as 5ns (Clock Rate=200Mhz), but for still lower cycle-time, the outputs came garbled.
Data Memory
32-bit Adder
32-bit ALU
Register File
32-bit Adder
The program does a sum of the data in the data mem. starting at address 0x10000000 to address 0x10000024, and places the result in memory location 0x10000028. A look at the code shows that the total cycles required for the program is 46. With the clock period = 100ns, it is indeed seen that the program terminates at t=4600ns. Also the last instruction stores the summation (=37) at address 0x10000028. This can be seen from the traces. Also it can be observed that PC is auto loaded with 0x400020 when Reset=1
The program compares the data in the data mem. starting at address 0x10000000 to address 0x10000024 with $6. For values that are < $6, $6 is decremented by that value, and the mem location is filled with 0. Final value of $6 is placed in memory location 0x10000028. A look at the code shows that the total cycles required for the program is 72 for the given data. With the clock period = 100ns, it is indeed seen that the program terminates at t=7200ns. Also the last instruction stores $6 (=0x38, which is indeed correct) at address 0x10000028. This can be seen from the traces. Also it can be observed that PC is auto loaded with 0x400020 when Reset=1 (Arrows show where 0x0 is written in memory locations)
10
The program sorts the data elements in the data memory in the address range 0x10000000 to 0x10000024. Simulation of the program shows that the total cycles required for the program is 379 for the given data. With the clock period = 100ns, it can be seen that the program terminates at t=37900ns. The trace below shows staring of the outer and inner loops (thick and dotted arrows), and points where the two largest numbers (0x9 and 0xa) are stored in memory locations 0x10000020 & 0x10000024 respectively (shown by the symbol).
Program Ends
11