You are on page 1of 104

Designing a chip

Challenges, Trends, and Latin America


Opportunity
Victor Grimblatt
R&D Group Director

Synopsys 2012

SASE 2012

Agenda
Introduction
The Evolution of Synthesis
SoC
IC Design Methodology
New Techniques and Challenges
IP Market, an opportunity for Latin America
Synopsys 2012

Introduction

Synopsys 2012

Interesting Facts from Cisco


Last years mobile data traffic eight times the size of the entire
global Internet in 2000
Global mobile data traffic grew 2.3-fold in 2011, more than
doubling for 4th year in a row
Mobile video traffic exceeded 50% for the first time in 2011
Average smartphone usage nearly tripled in 2011
In 2011, a 4th generation (4G) connection generated 28x more
traffic on average than non-4G connection

Source: Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 20112016, Feb 14, 2012

Synopsys 2012

Drives Exploding Need for Bandwidth


and Storage
Bandwidth Increase

A Decade of Digital
Universe Growth
7.910
Zettabytes
8000
7000
6000
5000
4000
3000
2000
1000

130
Exabytes

1.2
Zettabytes

0
2005

Synopsys 2012

2010

2015

One zettabyte = stacks of books


from Earth to Pluto 20 times (72
billion miles)
If an 11 oz. cup of coffee equals 1
gigabtye, then 1 zettabyte would
have the same volume of the
Great Wall of China

Source: IBS and Cisco

Synopsys 2012

Tomorrows World
Reality Augmented Reality Blended Reality
Search Agents Info That Finds You
(and networks that know you)

2D 3D Immersive Video Holographics


Medical Mobile Medical Personal Medical
Person to Person Machine to Machine
Human Machines
Synopsys 2012

What the Future Has in Store

Synopsys 2012

How Does This Affect Design?

Synopsys 2012

Megatrends Change Design Requirements


Used to Be

Computing
Creating Info
Compute Power
Business
At your desk
Work

Synopsys 2012

10

Today Its

Connectivity
Consuming Info
Battery Power
Consumer
Anywhere, anytime
Entertainment

Trends Drive Process Migration


Last

35%

Current

Next
31%

30%
25%
20%

20%
15%

13%

13%

10%
5%

5%

6%

180nm

130nm

5%

4%

3%

0%
250nm

Synopsys Global User Survey, Feb 2012


N = 1290
Synopsys 2012

11

90nm

65/55nm

45/40nm

32/28nm

22/20nm

<20nm

and Increasing Gate Count


50%
45%
>100M, 13%

40%

35%
30%

50-100M, 6%

25%

20-50M, 7%
>100M, 3%

20%

50-100M, 3%

10-20M, 5%

20-50M, 3%

15%

10-20M, 5%

10%

5-10M, 9%

5-10M, 4%

5%

2-5M, 6%

2-5M, 7%

2010

2011

0%
Synopsys Global User Survey, Feb 2012
Synopsys 2012

12

and Faster Designs


100%

>2GHz
1-2GHz

751MHz-1GHz

80%

42%

501-750MHz
401-500MHz

60%

301-400MHz
40%

201-300MHz

20%

101-200MHz
51-100MHz
50MHz

0%
2004

2005

2006

Synopsys Global User Survey, Feb 2012


N = 962
Synopsys 2012

13

2007

2008

2009

2010

2011

while requiring aggressive Power


Management
400%

Other
Back-biasing/Well-biasing

350%

Library Variables (e.g., multi-channel


length libraries)
Low Vdd Standby

300%

State retention

250%

MTCMOS/Power gating

200%

Lower Vdd operation


Dynamic Voltage/Frequency Scaling
(DVFS)
Multi-Corner, Multi-Mode (MCMM)
optimization
Multi-voltage domains

150%
100%
50%

Multi-Vt leakage optimization


Clock gating

0%
2010
Synopsys Global User Survey, Feb 2012
N = 282
Synopsys 2012

14

2011

Design Challenges are Multiplying


Example of 28-nm challenges

Unidirectional Poly (and other RDRs)

Device segmentation

Limited device sizes, large analog devices broken up into smaller


pieces; Increases analog area

28 nm is 2X harder than 40 nm
28 nm IP area increases
Complexity
Approximately 1700 design rule checks at 28nm vs. 700 at 65nm
without circuit innovation

Requires separate layouts, verification & test effort. GF and TSMC


have different preferred orientations (N/S v. E/W)
No poly for local routing

28 nm analog layout
9% larger than 40 nm
due to limitations
on poly area

8x the # of corners at 65 v. 28nm


Lower Vddmin resulting in less design headroom
Metal resistance doubles from 40 nm to 28 nm

Global versus local Vth variations due to random doping effects


Device Aging

Must take into account device degradation over time due to


threshold voltage instability (NBTI/PBTI) and mobility degradation
(HCI)

Synopsys 2012

15

40 nm layout

System on a chip
SoC = Software
HW & SW Development Costs
App-Specific SW

$2.50

Low-Level SW
OS Support

$2.00

Design Management
Post-silicon Validation

$M

$1.50

Masks
Physical Design

$1.00

RTL Verification
$0.50

RTL Development
Spec Development

$1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627
Source: IBS, Synopsys
Months

IP Qualification

Software is Half the Time to Market For a Typical SoC !


Synopsys 2012

16

And Half the Cost


$175

Software

$150

Cost ($M)

$125

Hardware

$100

$75

$50

$25

$0
90nm (60M)

Synopsys 2012

17

65nm (90M)
45/40nm (130M)
32/28nm (180M)
Feature Dimension (Transistor Count)
Source: IBS and Synopsys, 2011

22/20nm (240M)

Unlike Moore
Software Guys are Pessimists
Pages Law: 2009

Software gets twice as slow every


18 months.
Wirths Law: 1995

Software is getting slower more rapidly


than hardware becomes faster.
Synopsys 2012

18

What Can We Do About It?

Synopsys 2012

19

The Evolution of Synthesis

Synopsys 2012

20

Placement & Routing


Ronald L. Rivest, Charles M. Fiduccia, Robert M. Mattheyses,
GE & MIT, 1982

Source: GE, 1986


Synopsys 2012

21

Logic Synthesis
David Gregory, Karen Bartlett, Aart J. de Geus, Gary D.
Hachtel, GE & University of Colorado at Boulder, 1986

Synopsys 2012

22

Until Late 80s


The Implementation Flow Was Quite Straight Forward
There Was Already a Wall

Front-End

Schematic Capture
Timing Simulation

Back-End

Place & Route


DRC/LVS

Synopsys 2012

23

Early 90s
The Relationship Needs Improvements Badly:
Walls Now Lead to Iterations, Often Out of Control

Front-End

RTL Simulation
Logic Synthesis

Back-End

Place & Route


DRC/LVS

Sign-Off
Synopsys 2012

24

Delay Calculation
Timing Simulation

Early 00s, 130nm, 7+ Metals


PC and Astro+Blast+SilEnsemble The Relationship Matures
Still, Too Many Walls, and # of Iterations Too High
RTL Simulation
Logic, Power & Test Synthesis

Front-End
Floorplan
Physical Synthesis

Back-End
Sign-Off
Synopsys 2012

25

Floorplan
P&R

Extraction & STA


DRC/LVS

The Evolution Of The Relationship


Convergence !

2003
90 Nanometers
Interoperability

Synopsys 2012

26

2005
65 Nanometers
Correlation

2007
45/40 Nanometers
Look Ahead

2009
32/28 Nanometers
In-Design

The Evolution Of The Relationship


Quick Summary

Late 80s - Early 90s. Attempt #1 :


Predict the future based on the past
Wire load models, broken by nanometer wires

Mid 90s. Attempt #2 :


Predict the future based on the present
Front-end floorplanning, broken by Frankenstein flows

Late 90s Today. Attempt #3 :


Partner to create the future , rather than attempt to predict it
Convergence of synthesis and place & route
But underlying mathematics is different

Synopsys 2012

27

Logic Synthesis And Place & Route


A Revolutionary Evolution : Convergence !
Logic Compiler, ca. 1986

Design Compiler, 2010.03

From Equations to Gates, to Placed and Routable Gates


Synopsys 2012

28

SoC

Synopsys 2012

29

What is High-Level Synthesis?


Designer
Intent

User inputs:
High-level algorithm
Constraints

c a * b c;

Automation using
High-Level Synthesis

HLS outputs:
HLS
Results

Synopsys 2012

Synthesizable RTL
C-model
RTL testbench
Scripts for synthesis,
verification and
downstream tools

30

Design technology and methodology


Develop and verify hardware at a higher level of
abstraction

Much smaller code with fewer bugs introduced


Rapid architecture exploration

Automate implementation and verification

Automatic optimizations that equal hand-coded QoR


Eliminate manual RTL coding & verification

Example benefits
2-5X productivity for initial designs
5-10X productivity for design re-use
Increased exploration leading to better results
Multi-million gate designs in weeks vs. months

High-Level Synthesis Advantage


Algorithm
Design

RTL Coding

Cycle by cycle
functional debug

For single architecture only

HLS-based Block Design

Algorithm
Design

RTL automatically generated

High-Level
Design

Faster design at
higher abstraction

31

RTL
Verification

Quickly evaluate
multiple architectures

Implementation

Spreadsheets

Synopsys 2012

RTL Verification

Implementation

Architecture
Exploration

Traditional Block Design

Better Designs,

Faster, more automatic model-to-RTL


validation, reduced RTL-level debug

Faster

Changing FPGA Design Methodology


Classic FPGA
Methodology
Top Down
Implementation

Best Quality of Results


May not be suitable for largest FPGA
designs (long runtimes and large memory
requirements)

Divide and Conquer


Top Down
Incremental
Implementation

Reduced Quality of Results


Shorter runtime -preserve unchanged parts
Design Preservation, block based flows,
and Incremental P&R with SmartGuide

Emerging
Mix and Match
Bottom Up and Top
Down Flow

Distributed development
Better design preservation and isolation
Design style adjustments needed to achieve
optimal timing Quality of Results (e.g.
registering module boundaries

Synopsys 2012

32

Unified RTL Flow for FPGA and SOC


FPGA Synthesis

DesignWare
IP

Synplify
Premier/Certify

DW Implementation

Your IP
ASIC Implementation

DesignWare
Building Blocks

Galaxy

DW Implementation

Common RTL from prototype to production a combination of IP and tools

All DW Building blocks, minPower and Macrocell Blocks are supported in


Synplify Premier and Certify for FPGA-based prototyping

Synopsys 2012

33

Todays SOC Designs


Designs are getting larger and larger.
Schedule stays the same or shorter despite the
increases in design complexity.
Engineering resources are not increasing to handle this
complexity.

How can EDA help manage this complexity?


Synopsys 2012

34

Many Methods of Designing SOC Design


Similar Approach But End Results Vary
Final Product Varies

Building Blocks
Instructions
Instructions
1. Preheat the oven to 450.
2. Melt butter and chocolate together in the top of a double broiler
or in the microwave. Add sea salt.
3. Meanwhile, beat together the egg, egg yolks, and sugar with a
whisk or an electric beater until light
and slightly
foamy.
4. Add the egg mixture to the warm chocolate; whisk quickly to
combine. Add flour and stir just to combine. The batter will be quite
thick.
5. Butter small ramekins, or use Reynolds foil cupcake liners.
6. Divide the batter evenly among the ramekins. (You can make
the cakes in advance to this point and chill them until you're ready
to bake. Be sure to bring the batter back to room temperature
before baking.)
7. Baking time will depend on your oven; start with 7 minutes for a
thin outer shell with a completely molten interior.
8. Melt a little more chocolate to drizzle on top. Sprinkle a little
more salt, and serve with berries or ice cream.

Synopsys 2012

35

Ever Increasing Chip Size


Leads to Hierarchical Design
Flat versus Hierarchical

Typical
Threshold

Hierarchical
Flat

Instances

Synopsys 2012

36

3M 5M

15M

100M+

Ten Best Practices for Hierarchical Design


Understanding These Practices Can Help
#1 Floorplan
Affects design closure

#2 Top-Level Style
Requires different discipline

#3 Block Size
Tradeoff size versus TAT

#4 Modeling
Modeling for top-level closure

#5 Top-Level Closure
Meeting the inter-block signals

Synopsys 2012

37

#6 Block-Level I/O Paths


Affects block design closure

#7 Block-Level Drivers/Loads
Affects block boundary closure

#8 Inter-Block Critical Paths


Absence helps chip closure

#9 Constraints Management
Affects design closure & TAT

#10 Signoff STA


Correlates to close timing

#1 Floorplan
Affects Design Closure
Example 1

vs.

Example 2

Synopsys 2012

38

Logical connectivity
Clock
Voltage areas
Physical size
Multiple Instantiated
Modules (MIM)

Macro Placement
Power Planning
IO Planning

vs.

Challenge

Partitioning Guidelines

Better Approach

#2 Top-Level Style
Requires Different Design Discipline
Channel

Narrow Channel

Abutted

Implementation Complexity
clock
Synopsys 2012

39

Data

#3 Block Size
Tradeoff Size versus TAT (turn around time)

1.5M
1.5M

1.5M

3M
1.5M

5M

1.5M
2M

2M

5M

1.5M

Faster TAT per block


but more blocks to integrate

Longer TAT per block


but fewer blocks to integrate

What Is Reasonable Size Depends A Lot On Design Team Preference?


Note: Block Size in instances
Synopsys 2012

40

#4 Modeling
ETM vs. Abstract Model
Extracted Timing Model (ETM)

Abstract Model

Blocks modeled by timing arcs only


Used for customized IP

Interface cells of each block retained


Recommended for P&R blocks

Synopsys 2012

41

#5 Top-Level Closure
Meeting Timing on Inter-Block Signals
Closing top-level inter block
signals can be challenging
Can be minimized with
Chg graphic

Proper estimation of interface


constraints
Proper floorplanning for signal
connectivity between blocks

Simultaneous optimization of
top-level and inter-block
signals needed

Synopsys 2012

42

#6 Block Level I/O Paths


I/O Paths Are Typically Not Finalized Early
Typical Hierarchical Structure

Logic
Registers

Logic

Logic
Registers

Adjacent Block

Logic
Registers

Block Under Design

Logic
Registers

Adjacent Block

I/O paths are not finalized during early stage block design
Overconstraining these paths direct the tool to focus on I/O paths
instead of the intra-block paths
Accuracy of proportional time budgets is affected if interfaces are
still changing
Synopsys 2012

43

#6 Block Level I/O Paths


Registering Block Outputs Makes Budgeting Easier
A Better Approach

Logic

Logic
Registers

Adjacent Block

Logic
Registers

Logic
Registers

Block Under Design

Registers

Adjacent Block

Registering block outputs makes budgeting less dependent on


completeness of the netlist and easier
Re-partitioning logic hierarchy helps manage constraints complexity
Partitioning according to power domains / logic hierarchy makes
flow easier
Synopsys 2012

44

#7: Block Level Drivers and Loads


Modeling I/O with Realistic Values Drives Convergence
Block Interface timing is one of the toughest issues in hierarchical flow
Realistic model of your input and output ports helps design convergence

Block A

Block B
A

When designing Block A, need to consider load at output port A


set_load
When designing Block B, need to consider driving cell at input port B
set_driving_cell
Synopsys 2012

45

#7: Block Level Drivers and Loads


Inter-blocks Paths Are One Of The Toughest SOC Challenges

If no load
is specified

Cell cannot be sized


correctly

Without good estimation of loads and driving cell


Integrating these blocks forces iterations unnecessary to meet timing

Budgeting can automatically generate driver and load information


Generate a quick netlist to run through budgeting for more accurate results

Synopsys 2012

46

#8: Inter-Block Critical Paths


Absence Helps Chip Closure
Block to Block path,
crossing Top

Top to Block
path

Avoid critical paths crossing


multiple blocks
Makes timing closure difficult

Contain them within the same


block or if you must cross multiple
blocks, minimize the number of
blocks
Budgeting, sizing, and load
estimations are needed to solve
inter-block critical paths violations
. If tool cannot see complete path, may be
challenge to stitch them at top-level

Synopsys 2012

47

#8: Inter-Block Critical Paths


Shielding Helps Chip Closure
Without Shielding

With Shielding

Use shielding to reduce crosstalk effects between the block- and toplevel t significantly improve timing closure in inter-block critical paths
Use new Transparent Interface Optimization (TIO) in IC Compiler

Synopsys 2012

48

#9: Constraints Management


Pay Attention to Constraints
Eg: Infeasible Path, insufficient for 1 clock cycle

Infeasible paths are paths that


are impossible to meet timing
Missing false path/multi-cycle
path constraints
Unreasonable input/output
delay constraints

Eg: Infeasible Path, i/p delay too large

Other things to watch out

Synopsys 2012

49

size_only attributes
dont_touch attributes
Multi-cycle paths
False paths
Etc.

#10 Signoff Correlation


Tighter Correlation Helps Close Timing

Use IC Compiler signoff correlation checker system

Performs both consistency and correlation check with user controllable accuracy
level
Supports both pre-route and post-route checks

Synopsys 2012

50

#10 Signoff Correlation Flows


Flows for Pre-route and Post-route Correlation Checks
Pre-Route Flow

Focus on environment and library setup for pre-route correlation


Certain variables for correlation may have runtime and/or QoR
impact on optimization
Correlation setup may change and re-check may be needed for
post-route
Synopsys 2012

51

Todays Designs Are Big & Hierarchical


Timing Signoff Challenges

More effects, more variation


Impacts accuracy vs. runtime

Hierarchical P&R vs. flat signoff


Large machines and runtime
Interactions between top & block

30-40% blocks are tough to close


10 to 20 ECO iterations

Lots of scenarios to analyze


Source: L. Besson, STMicroelectronics

Synopsys 2012

52

more machines, more reports

The Nanometer Challenges


Top Issues to Look at

(1) SION Dielectric/Polysilicon Gate; (2) High-k Dielectric/Metal Gate

Source: ITRS 2009; C.A. Malachowsky, NVIDIA, EDPS 2009; P. Saxena, Intel, ISPD 2003
Synopsys 2012

53

IC Design Methodology

Synopsys 2012

54

But, Synthesis has Evolved


Synthesis has evolved
beyond logic mapping
Its now predicting and
resolving congestion for
physical design
Synthesis prediction of
physical effects evolution
is key to progress

Synopsys 2012

55

And, Physical Design Under Heavy Load


Increasingly, Physical
Design is the driver for
implementation schedule
Its where the rubber
meets the road speed,
die-size, power, yield ..
P&R evolution key to
progress

Synopsys 2012

56

Whats on Designers Mind?


Design & Project Management!
How close are we to our design goals?

Whats the status of the blocks


right now?

How can I use the experience


from this project to plan the
next one better?

Is everyone using the same tool


version and the standard scripts?

How much compute and license


resources are we using?

Whats taking up the most time?


Which step? Which block?

Synopsys 2012

57

Many Flavors Of Methodology


Imagination Is the Only Limit

Source: www.bk.com 2010


Synopsys 2012

58

Past Guidance doesnt Always


Apply to the Present
create_clock -period [0.7 * target]
high performance
set_max_area to 0
small area
Use small blocks for fast turnaround time
Things have changed but users are still
using the above techniques!

Place
& Route

DRC / LVS

Synopsys 2012

59

Place
& Route

DRC / LVS

Signoff

Signoff

Signoff

Design
Planning

Synthesis

Synthesis

Place
& Route

DRC / LVS

Synthesis
Exploration

Synthesis

2011Exploration

2009-2010
In-Design

Place
& Route

DRC / LVS

Implementation

2005-2008
Look-ahead

Signoff

2000-2005
Correlation

The Past vs. The Present


Wireload Model (WLM) results in higher frequency during Synthesis
than using Design Compiler Topographical (DCT) technology

Figure 1

Figure 2

With WLM, these two circuits


have the same delay

With DCT, the delay is a reflection


of the x-y location of the cells

Which is more realistic?


Synopsys 2012

60

Ten Best Practices for


Design Methodology
#1 Libraries
Know Your Attributes

#2 Setup
Correlation and Runtime

#3 Scripts
Impacts Your Design

#4 Constraints
Watch Your Constraints

#5 Analyze
Analyze-Fix-Proceed

Synopsys 2012

61

#6 Methodology
One or Two Flows

#7 Optimization
Adjust Accordingly

#8 Signoff
Review Your Environment

#9 Performance
Leverage Your EDA Partner

#10 Low Power


Architecture Drives Power

#1 Libraries: Know Your Attributes


Why is my design larger in area?
Why is it taking so long to run?

After
Optimization

Original Area
New Area

Watch for dont_use, dont_touch, and size_only usage in your


libraries and scripts
Attributes are user-controlled to guide optimization
Restricting optimization may lead to problems
Synopsys 2012

62

Technology and IP
Make Sure to Have a Good Quality Library
A properly designed set of library
cells give optimization engines more
choice

Avoid cells sensitive to minor change


in load, impedes convergence
Footprint-equivalent cells are useful
for final-stage optimization w/ minimal
perturbation to other design metrics
Std. cell pins should be on grid (especially complex cells with small
drive strength: higher pin density)
Multiple variants for each flop (drive
strengths, delays, setup times, .. )

Library quality enabler for targeted


performance

Synopsys 2012

63

Example:
Cell Sensitivity To Load Uncertainty

Cell A

Delay

Cell B

D
*

C*

Cload

#2 Setup: Correlation and Runtime


What do designers do when they run into these?
Netlist v1.0
SDC v1.0

Netlist v1.1
SDC v1.1

Compile
3.2M
instances

Compile
6.8M
instances??

Found issues after days of


engineering work
Size_only on 3.7M cells
SDC with all cells set with
set_disable_clock_gating on

Synopsys 2012

64

What
happened???

Review Your Settings and Input


Understand the Different Objectives

DC Utility
Checker

Detect design issues and dirty


constraints styles that can lead to
bad runtime/memory and QoR

ICC Utility
Checker

Detect readiness of physical design


before going into various
implementation stages

PT Utility
Checker

Detects application variables,


settings and design issues causing
runtime or memory increase

Synopsys 2012

65

#3 Scripts: Impacts Your Design


When someone tells you Tool A is X times faster than Tool B

Incomplete

Complete

Need to put things in perspective

First Step: review your script

Synopsys 2012

66

How was the script migrated to Tool A?


Did you also update the script to leverage the latest
technologies?

Early stage of your design, think fast mode


Final stage of your design, think QoR

Tool Input can Impact Results


Understand How the Tool Can Help Meet Design Goals
Todays design requires
completeness
Synopsys tools are tailored for
performance, but they also have
a mode to run fast

Recommendations

Synopsys 2012

67

The typical complaint is long runtime,


choose your goal setting accordingly
Make sure your script is up to date for
your end goal and to take advantage
of the latest features

#4 Constraints: Watch Your Constraints


Symptoms of over-constraining: long runtime,
excessive buffering and huge violations
Over-constraining could guide the
tool to focus on artificial critical paths

Original Clock period

Input Delay

Output Delay
Time Available
for logic

Over-constraining happens with


Unrealistic input and/or output
delays
Tightening the clock period
Specifying large clock uncertainty

Synopsys tools are designed to work towards meeting design goals


but dont expect miracles!

Synopsys 2012

68

Understanding EDA Tool will help


Simple Illustration
Will DC do this transformation?
CLKA wns = -0.300
CLKB wns = -0.100

CLKA wns = -0.280


CLKB wns = -0.150

Circuit A

Circuit B
Cost = pi * wi

Default Weights

Delay Cost Before

Delay Cost After

CLKA weight = 1
CLKB weight = 1

0.30
0.10

0.28
0.15

Total WNS Cost

0.40

<

0.43

Adjusted Weights

Delay Cost Before

Delay Cost After

CLKA weight = 10
CLKB weight = 1

3.00
0.10

2.80
0.15

Total WNS Cost

3.10

Synopsys 2012

69

>

2.95

Total cost increased


Transformation rejected

Worst WNS = -0.300

Total cost reduced


Transformation accepted
Worst WNS = -0.280

#5 Analyze: Analyze-Fix-Proceed
Push Button Flow
does not exists

Synopsys 2012

70

Know your circuit


to guide the tool

Synopsys Galaxy Implementation Flow


compile_ultra -spg
DC Graphical

insert_dft

compile_ultra spg -incr

IC Compiler

StarRC
PrimeTimeSI

Synopsys 2012

71

place_opt -spg
clock_opt
route_opt
signoff_opt

Signoff extraction
Signoff STA

Analyze
results
between
design
stages

#6 Methodology: One or Two Flows


Design specifications and constraints changes
constantly during the design cycle
One flow
for both
exploration &
Implementation

180 nanometers (2000)


225K gates, 11 RAMs
150 MHz
Synopsys 2012

72

Exploration flow
target for
early specs
& constraints

Implementation
flow
for final
design
realization

45 nanometers (2010)
96mm2, ~ 300M transistors
7-9W

Exploration Throughout Galaxy


DC Explorer
Early RTL Exploration
Accelerates Design Schedules

Exploration

Implementation
RTL

Design Compiler
Look-ahead & Physical Guidance
Creates a better starting point

RTL
Exploration

RTL
Synthesis

IC Compiler
Design Exploration
Creates initial floorplan

Design
Exploration

Design
Planning

Block Feasibility
Determines physical feasibility

Block
Feasibility

Block
Implementation

Galaxy Constraint Analyzer


Continuous improvement

Synopsys 2012

73

Physical

#7 Optimization: Adjust Accordingly


Adjust your constraints to model effects of
downstream design steps
An Illustration

Design
Compiler

Account for clock trees


No hold-timing fixing
Be careful with critical range
Do not over-constrain

Synopsys 2012

74

Manage Design Constraints Throughout


Guidelines For Convergent Timing Closure
Synthesis and placement

Remove pre-CTS estimated


constraints

1,000
950
900

1029

Timing Closure
Profile

971
913

850

Do Not
over
Place
Clock
Route
Complicate your flow
Addnl. Customization For High-Performance
Tuned For Hi-Performance/Low Power

Remove/adjust pre-route constraints


Adjust crosstalk thresholds

Synopsys 2012

1,050

Synthesis

Route

1,100

800

CTS

Do not over-constrain during


synthesis
Use DC SPG flow
Account for max_transition and clock
uncertainty
Specify pre-CTS estimated
constraints

Timing Closure Profile

MHz

75

RM (Baseline)

#8 Signoff: Review your Environment


Unlike wine, scripts grow stale with age
Runtime (CPU Hrs)

Memory Usage (GB)


172 GB

60

128

50

112
96

40

80
30

64

20

48
32

10

16
0
1.1

1.2

5.5

37.0

Instances (Million)

50+

0
1.1

1.2

5.5

Designs run at customer site using revised


PrimeTime scripts and latest release version
Synopsys 2012

76

37.0

Instances (Million)

50+

PrimeTime Scripts: Key Areas to Review


Environment and setup
Use latest release and
ensure adequate hardware
resources

Reading parasitics
Use binary parasitics when
possible
Multiple timing updates
Eliminate redundant/legacy
update_timing steps
Inefficient TCL scripting and
reporting

Synopsys 2012

77

PrimeTime Design Utility Checker


can help with some of these tasks

#9 Performance: Leverage Your


EDA Partner
Starting Point

Reduce time-to-results

Built on Synopsys RM
Understand the new
technologies and features
Easy to use

Synthesis

P&R

Automated methodology to
achieve 90% of target quickly
Additional advanced
techniques to reach final goal
Minimize number of iterations
or trial and errors
Reduce ECO efforts
Iterations

Signoff + ECO

Typical Flow

HSLP Flow

Design Schedule
Synopsys 2012

78

HSLP Implementation Best Practices


Reduces Time-to-Results
High Performance, Low Power (HSLP) Flow Requires Customization
Typical Flow on
Regular designs

Targets

Typical Flow on
High Performance designs

100%
90%

Typical Flow

75%

With HSLP
Implementation
Best Practices

HSLP Flow

Reduces time-to-results

Time

Synopsys 2012

79

Design-specific
customization

#10 Low Power: Architecture Drives Power


VDD

VDDB
VDD

VDDI
IN

VDDB

VDDO
L
S

VDD

OUT

VDD
VDDB

on/off

IN
ISO

RR

OUT

IN

OUT
AO

EN
Gate

Gate

Gate

VSS
VSS

DESIGN TECHNIQUES

0.9V

0.7V

Isolation
Cells

VSS

Retention
Registers

Alwayson Logic

0.9V

OFF
0.9V
0.9V

Multiple Voltage
(MV) Domains

Level
Shifters

VSS

Power
Switches
(MTCMOS)

0.9V

Multi-Supply with
shutdown
No State Retention

OFF
0.9V
0.7V

Multi-voltage with
shutdown & State
Retention

0.9V

OFF
SR

0.9V
0.7V

Multi-Voltage with
shutdown

0.9V

Synopsys 2012

80

New Techniques and Challenges

Synopsys 2012

81

The Race to 20nm Is On!


Leading The Way In 20nm Design

Synopsys 2012

82

The 20 nm Challenge: Single Exposure


Last Pitch With Single Exposure ~ 80 Nanometers
We Can Print This,

But We Cannot Print This

Source M. van den Brink, ASML, ITF 2009; P. Magarshack, STMicroelectronics, 2010
Synopsys 2012

83

The Solution: Double Patterning


A Significant Change

We Can Print This, and This,

Synopsys 2012

84

And Then This!

Synopsys Solution
DPT Ready IC Compiler P&R, and IC Validator DRC
Wide Spacing Enforced

Two-Color Decomposed Design

Source: Synopsys Research 2011


Synopsys 2012

85

Synopsys Solution
DPT Ready IC Compiler P&R, and IC Validator DRC

Source: Synopsys Research 2011


Synopsys 2012

86

The Challenge: Planar CMOS


Insufficient Performance, Excessive Power

32 Nanometer Planar

Performance Power

Source: K. Kuhn, Intel, IDF 2011


Synopsys 2012

87

The Solution: Non-Planar CMOS


FinFET or Tri-Gate CMOS

22 Nanometer Tri-Gate

Performance Power

Source: K. Kuhn, Intel, IDF 2011


Synopsys 2012

88

The Solution: Non-Planar CMOS


The First Revolution

Source: M. Bohr, Intel, YouTube 2011


Synopsys 2012

89

There Are Many Flavors, But


Reality and Fantasy Are not the Same Thing !

Synopsys 2012

90

FinFET Advantages
FinFET vs Planar Transistor

Superior drive current


Inversion Layer

Planar

Active region spans the fin height and


thickness (3 sides)
Ids (2*Hfin+Tfin) as opposed to just
thickness for planar

Reduced leakage

Enhanced electron
mobility

FinFET

Fin

Source: Intel

Synopsys 2012

91

Depleted substrate

High-K gate oxide


Metal gates in place of PolySilicon
Strained silicon
Multiple fins possible to increase total drive
strength for higher performance

This Is Not The End of Moores Law!


But the Gap Between Intel and the Crowd Is Widening

Source: M. Bohr, Intel, IDF 2011


Synopsys 2012

92

3D ICs: Technology Trends


Four Main Categories of > 2D-IC Ahead
1

C4

Memory
Cube

(Wide I/O) Memory


Cube on Logic

Synopsys 2012

TSV

93

Silicon Interposer
3D Stack

Bump

3D-IC Two Basic Configurations Emerging


Addressing Gigascale Design Challenges

Silicon Interposer (2.5D)


Horizontally connected dies

Drivers: Consumer, Storage, Networking


Benefits: Yield, Cost, TTM & Flexibility

3D-IC
Vertically stacked dies with TSVs
Drivers: Wireless handset, Processors
Benefits: Performance, form factor

Synopsys 2012

94

The Memory Cube Now


8 die stack

560 microns
50 microns

1
Source: C.-G. Hwang, Samsung, IEDM 2006
Synopsys 2012

95

IP Market, an opportunity for Latin


America

Synopsys 2012

96

IP
Intellectual property core, IP core, or IP block is
a reusable unit of logic, cell, or chip layout design
that is the intellectual property of one party
IP cores may be licensed to another party or can be
owned and used by a single party alone

IP cores can be used as building blocks within ASIC


chip designs or FPGA logic designs

Synopsys 2012

97

IP

IP cores in the electronic design industry have had a profound impact on


the design of systems on a chip

IP core licensor spread the cost of development among multiple chip


makers
IP cores for standard processors, interfaces, and internal functions have
enabled chip makers to put more of their resources into developing the
differentiating features of their chips new innovations faster
Licensing and use of IP cores in chip design came into common practice in
the 1990s

Synopsys 2012

98

Semiconductor IP Market Segments


2011 Design IP Revenue: $1.9B
Block Libraries
1%

Physical libaries
3%

Other IP
4%

GP Analog/MS
4%

Processors
(CPUs, GPUs, DSPs)
Memory Cells/Blocks
10%

Microprocessors
39%

Wired Interfaces
19%

Fixed Function
(GPUs, Security)
15%

Source: Gartner, March 2012


Synopsys 2012

99

DSP
5%

Semiconductor IP Market Size


Synopsys Share
2,000.0

14.0%

1,800.0
12.0%
1,600.0

$M

1,200.0

8.0%

1,000.0
6.0%

800.0
600.0

4.0%

400.0
2.0%
200.0
0.0
Semiconductor IP Market Size
Synopsys Share

Source: Gartner, March 2012


Synopsys 2012

100

CY04
964.0
7.9%

CY05
1,068.3
7.6%

CY06
1,267.3
7.3%

CY07
1,378.2
7.2%

CY08
1,464.1
7.2%

CY09
1,351.0
9.1%

CY10
1,695.0
11.3%

CY11
1,910.9
12.4%

0.0%

Synopsys Share

10.0%

1,400.0

Top Semiconductor IP Vendors

Rank
1
2
3
4
5
6
7
8
9
10

Company
2010
ARM Hol di ngs
575.8
Synops ys
191.8
Ima gi na ti on Technol ogi91.5
es
MIPS Technol ogi es
85.3
Ceva
44.9
Si l i con Ima ge
38.5
Ra mbus
41.4
Tens i l i ca
31.5
Mentor Gra phi cs
27.3
AuthenTec
19.6

Source: Gartner, March 2012

Synopsys 2012

101

2011
732.5
236.2
126.4
72.1
60.2
42.8
38.9
36.3
23.6
22.8

Growth
27.2%
23.2%
38.1%
-15.5%
34.1%
11.2%
-6.0%
15.2%
-13.8%
16.3%

2011 Share
38.3%
12.4%
6.6%
3.8%
3.2%
2.2%
2.0%
1.9%
1.2%
1.2%

IP Vendors Also Need to Provide More


Functions and Functionality
120

70

100

60

% Design Reuse

80

50

60

40

40

30

IP Subsystems

20

20

IP Blocks
0

10
2005

2006

Source: Semico, October 2010

Synopsys 2012

102

2007

2008

2009

2010

2011

2012

2013

2014

% Design Reuse

Total Number of IP Blocks per SoC

Avg. # IP Blocks per SoC

Subsystems:
The Next Evolution in The IP Market
What is a Subsystem?

Complete
Solution: HW, SW,
Prototype

Synopsys 2012

103

Pre-integrated
and Verified

SoC Ready:
Seamlessly Dropin and Go

Thank You

Synopsys 2012

104

You might also like