Final Project

DEE 1053 Computer Organization 2005 Fall Final Project
Processor Design (Part II)

Due Date 2006.01.25 I. Introduction
In the midterm project, you have already learned how to design a single-cycle processor with your own instruction formats. Main architecture with datapath and control signals is now organized according to your idea. In the final project, you have to continue to improve your design in the trade-off between performance and area, and realize it with behavior models written by hardware description language. Project Part II is organized in the following steps. First, you are required to pipeline your processor in order to reduce cycle period and increase throughput. Then extra hazard detection circuits and modification for assembly code is necessary to solve the data dependency problems caused by pipelining technique. Besides, you need to implement the processor design in Verilog behavior models and verify it with testbench. Finally, detailed timing and area cost should be reported to determine your performance.
II. Architecture Optimization

Table 5 illustrates the detail timing delay and area cost for each components listed in the Table 4 of PART I. This information can help you estimate the total area and critical timing in your design and optimize it. The delay and area of most components depend on their bit-width. For example, a 24-bit ripple adder has 7.2 ns in time delay and 1200 units in area cost. Standard cells information is also listed in Table 6, which is used in the combinational logic in you layout. The number of inputs for these cells is up to four. Components RF16 RF8 R_EN R ADD_R ADD_C MP0 Description 24-bit dual read port 16-register RF 24-bit single read port 8-register RF N-bit register with write enable N-bit register without write enable N-bit ripple adder N-bit carry look-ahead adder N-bit in 2N-bit out multiplier
1
Timing Cost 4.1 3.6 0.1 0.1 0.3*N 0.2+0.6*floor(N/4) 0.9*N
Area Cost 51000 24500 62*N 60*N 50*N 50*N+80*floor (N/4) 60*N2
MP1 MP2 ALU4 ALU2 INV CMP EQV DEC2 DEC3 MUX2 MUX4 MUX8 MUX16
N-bit in 2N-bit out 1 stage multiplier N-bit in 2N-bit out 2 stage multiplier N-bit function ALU (and/or/xor/not) N-bit function ALU (and/or) N-bit inverse gate N-bit comparison and equivalence N-bit equivalence 2-bit input decoder 3-bit input decoder N-bit 2 to 1 mux N-bit 4 to 1 mux N-bit 8 to 1 mux N-bit 16 to 1 mux
Table 5
0.45*N (1cycle) 0.25*N (2cycle) 0.8+0.05*N 0.4+0.05*N 0.3 0.25*N 0.4 1.0 1.8 0.2+0.05*N 0.7+0.05*N 1.2+0.05*N 1.8+0.05*N
61*N2+125*N 62*N2+280*N 120*N 55*N 2*N 45*N 3*N 60 140 30*N 80*N 170*N 360*N
Detail information for area cost and timing delay of components
Cell not and or nand nor xor xnor
1-input 0.1 / 2 Table 6
2-input 0.2 / 2 0.2 / 2 0.1 / 2 0.1 / 2 0.3 / 5 0.3 / 5
3-input 0.2 / 3 0.3 / 4 0.1 / 3 0.2 / 4 0.5 / 7 0.5 / 7
4-input 0.3 / 4 0.4 / 5 0.2 / 4 0.3 / 5 0.7 / 10 0.7 / 10
Detail information for standard cells
III. Pipelining Design

The single-cycle processor architecture is inefficient in modern design since it needs to perform many operations in a single cycle period. Pipelining is a useful skill to increase data throughput by separating the original design into multiple stages. Thus, the period due to critical path can be dramatically reduced and the operation frequency is increased. In this section, try to take advantage of the pipelining technique on your previous design. Some rules needing attention are listed in the following: 1. Only four kinds of sequential components mentioned in Table 5 can be used for implementation. Use the two delay elements shown in Table 7 to complete your design. Do not use register file components as pipeline delay elements. Notice the waveform diagram of register file shown in Figure 4. 2. Pipelining technique may raise hazard problems, which can be solved in software detection or extra hardware circuits. If you want to solve it in hardware design, please add the forwarding path or hazard detection combinational logic.
2
N-bit register with write enable

Table 7
N-bit register without write enable
Basic sequential circuit elements
Figure 4
Waveform diagram of register file
3. 4.
The depth of pipelining stages is dependant on your choice but not more than six. Fewer stages may lead to lower frequency, but more stages will increase the hardware area. There are three kinds of multipliers available in PART I since multiplication unit always have much larger area and longer critical path than any other combinational components. You may choose one of them regarded as a trade-off problem between operating frequency and area cost.
Exercise Draw a layout of your processor architecture with pipelining. Indicate each stage by labeling them in the top of the layout. Mark the delay elements in bits such as Figure 5 showing. Any signal in different pipelining stage cannot have the same signal names. The symbols of functional units should be the same as those in Table 4 of PART I with adding comments. Try to optimize your design according to Table 5 for better performance and minimize the detail area cost and critical time in your design.
Figure 5
Partial example of pipelining design 3
IV. Behavior Modeling

In this project, you will need to implement your design with behavior models written by Verilog HDL. The modules are supplied by T.A. All you have to do is to make connection among these modules and realize the combinational logic with standard cells which is available in Verilog HDL. Implement your design and preserve the hierarchy of modules depending on your architecture layout. Notice that the I/O interface signals of the entire architecture should be the same as those given by T.A., as shown in Table 8. For more information, please check the example files. Signal CLK RST I_ADDR I_INST D_ADDR D_WEN D_WDATA D_RDATA Bits 1 bit 1 bit 10 bits 24 bits 8 bits 1 bit 24 bits 24 bits
Table 8
Description Clock signal Reset signal Instruction memory address given by program counter Executable instructions from instruction memory Data memory address decoded by processor 0 for reading data from specified address; 1 for writing data into specified address Data from processor to memory Data from memory to processor
I/O interface signals
Exercise Implement your processor according to the layout. The main module must be named as PROCESSOR and contained in file PROCESSOR.v, which should include the necessary behavior model library file MODEL.v and other necessary module files. Try to take advantage of module hierarchy to complete your design in order to save your time. All combinational logic should be realized in standard cells, such as and, or, etc, and written in independent modules. List the accurate area cost of each component used in the layout like Table 9 and calculate the total cost area of entire processor. Module Name ADD_R R MUX2 MUX2
Table 9
Bit-width Amount Area Cost 24-bit 24-bit 16-bit 1-bit 1 2 3 2 1200 2880 1920 80
Example of area list
V. Design Verification
After completing your design with Verilog behavior models, you are required to verify it
4
with testbench. Figure 6 illustrates the verification flow of testbench. The testbench reads the instructions from machine code file, sends data and instructions to the processor, receives the calculated results in data memory, and verify the answers after all operations. If there are bugs in the processor, the results will be different and errors will be detected. To execute simulation, use Icarus Verilog available in the CD-ROM of textbook or course website. However, if you want to observe detail internal signals in your design, simulation tool such as Nanosim is necessary.
Figure 6
Example of area list
Exercise Verify your behavior module design with testbench given by T.A. Your design must pass two machine code files for simulation. One is translated from C program in PART I, and the other is translated from assembly code which is available in MIPS assembly language and should be modified according to your own instruction set to fit your assembler. Check the output file to see if the answers are correct. Adjust the clock period in testbench to find out the maximum speed your processor can run and compare the minimum clock period with your estimated critical timing. You can use both Icarus Verilog and Nanosim to verify your design and observe the signals. Try to modify the example of module testbench to check smaller modules, which can save your verification time. Report the minimum clock period and total execution cycles determined in testbench.
VI. Summary of Part II

Congratulation! You have completed the project of this course. In this project, you have learned how to pipeline your design and avoid data dependency problems. Besides, area and speed trade-off is also taken into your consideration to enhance design performance. Finally your design is realized in behavior model and through verification. What should I hand in? 1. A layout of your processor with pipeline architecture in A4 pages. If you cannot finish your design in one A4 page, draw the roughly block diagram in the first page and detail design in the other pages. For better performance, you are allowed to modify your design 2. due to area, timing, and hazard problems. Hierarchical layout is welcomed. A list of area costs of each module used in your design like Table 9. Components with
5
different bits are viewed as different module. Calculate the total area cost of your processor. 3. A report in one A4 page. The report records minimum clock period, maximum operation frequency, total execution cycles for two machine codes, and the value of (period x area). Besides, you have to list the special features you used in your design, such as data hazard detection logic, forwarding circuits, and techniques to reduce area and timing delay. Upload the following files to T.A.s FTP server: Verilog files, which contains all behavior models of the processor and should be correct-verified and bug-free.
4.
What is the grading policy? 1. Completeness of functionality and well-explanation to the pipelining architecture layout is the key to better grade. Comments, color lines, hierarchical blocks, and detailed description can help you get higher scores. 2. Bug-free and correctly verified design has the highest scores. If you cannot pass the verification of testbench, try to explain the errors. Dead body still has some scores. 3. 4. 5. 6. Smaller value, (period x area), can lead to higher scores. As a result, reducing the critical timing and minimizing the total area is important. Special features in your design are viewed as bonus on the final scores. Try to list as many features as possible. Hierarchy design and comments in your verilog file is also helpful. Most important of all, hand in your project on time.

Final Project

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Project

Uploaded by

Copyright:

Available Formats

DEE 1053 Computer Organization 2005 Fall Final Project

Processor Design (Part II)

II. Architecture Optimization

Timing Cost 4.1 3.6 0.1 0.1 0.3N 0.2+0.6floor(N/4) 0.9*N

Detail information for area cost and timing delay of components

Cell not and or nand nor xor xnor

1-input 0.1 / 2 Table 6

2-input 0.2 / 2 0.2 / 2 0.1 / 2 0.1 / 2 0.3 / 5 0.3 / 5

3-input 0.2 / 3 0.3 / 4 0.1 / 3 0.2 / 4 0.5 / 7 0.5 / 7

4-input 0.3 / 4 0.4 / 5 0.2 / 4 0.3 / 5 0.7 / 10 0.7 / 10

Detail information for standard cells

III. Pipelining Design

N-bit register with write enable

N-bit register without write enable

Basic sequential circuit elements

Waveform diagram of register file

Partial example of pipelining design 3

IV. Behavior Modeling

Example of area list

Example of area list

VI. Summary of Part II

You might also like