You are on page 1of 8

Appendix A Scan Test Basics

Scan chain operation The purpose of scan design is to make a difficult-to-test sequential circuit easier to test in both concept and by use of automation. Consider the circuit in Figure 1. Lets say that to test for a certain fault condition the output of GATE4 needs to be set to 1. To be able to do this, a test pattern must be generated that drives both inputs of GATE4 to 1. And to do this, FF2 needs to be set to a 1. This implies that a sequential pattern must be generated, a pattern that applies stimuli to the inputs, pulses the clocks, applies new stimuli to the inputs, pulses the clocks again, etc. to establish an internal state. Similarly, to observe the results of the test, the results must propagate from internal nodes through sequential elements to the output pins (a simulator may be able to observe internal nodes, but testers can only see package pins). The complexity of the test generation task increases with the sequential depth of the circuit (the number of sequential elements between the input pins and the output pins) and the sequential-combinational ratio (then number of gates between banks of sequential registers).
A B
GATE1 GATE2 GATE3 GATE4 GATE5
SET

Y
Q D Q

SET

SET

FF1
CL R

FF2
CL R

FF3
CL R

CLK

Figure 1 A Non-scan sequential circuit

Now, consider a slightly modified circuit. Assume that all of the flip-flop Q-outputs were input pins to the circuit and all flip-flop D-inputs were output pins of the circuit. In that case, generating the test pattern would be much simpler, since it is purely a combinational problem. This is exactly what scan does to a circuit: It makes a sequential circuit appear combinational for test and combinational circuits are easy to deal with algorithmically by EDA vector-generation (ATPG) tools. Figure 2 represents a modified version of the circuit from Figure 1. In Figure 2, all of the flip-flops are connected together into a shift register. When the SE (scan enable) signal is de-asserted (set to 0), the circuit behaves exactly like the non-scan circuit in Figure 1. When SE is asserted (set to 1), control stimuli can be shifted from the scan input pin into any sequential element in the chain. Similarly, observable results can be shifted out of the scan chain through the SO pin this is why scan enhances observability and controllability.

DFT Tutorial

Crouch / Eide / Posse

Appendix A / Page 1

A B
D
SET

Y
Q D
SET

SI

SET

SO

CL R

CL R

CL R

SE CLK

Figure 2 -- Scan version of the circuit shown in Figure 1

For a basic scan pattern, the operating procedure of the scan circuitry is as follows: 1. Enable the scan operation to allow shifting (to initialize scan cells). In the circuit in Figure 2, set the SE pin to 1. 2. Set the scan input (SI) pin to the desired value, and pulse the clock (CLK). Repeat this until the entire scan chain is loaded with known values. 3. After loading the scan cells, hold the scan clocks off, de-assert SE and then apply stimulus to the non-scan input pins (apply primary inputs, PIs). 4. Measure the non-scan output pins (observe primary outputs, POs). 5. Pulse the clock to capture new values into scan cells. 6. Enable the scan operation to unload and measure the captured values while simultaneously loading in new values via the shifting procedure (as in step 1). Faults and Vector Generation Automation of the process occurs when ATPG tools create the vectors by targeting faults directly in an algorithmic manner. For example, GATE4 in both Figures 1 and 2 will require the placement of a logic 1 on the B-input if the A-input is suspected to be faulty. Then if the A-input is driven to a logic 0, then the 0 will pass if the gate is good, and a 1 will pass through if the gate is stuck-at-0. Similarly, the gate can be pre-set with a logic 0 on the A-input and a logic 1 on the B-input and at some point in time, the A-input can be transitioned to a logic 1 if the logic 1 is captured at the beginning of the next cycle, then the circuit does not have delay faults that impact operation if the logic 0 is captured, then the propagation delay of the transition happens too slowly. The Stuck-At, Transition-Delay, and even Path-Delay fault models are commonly supported in modern ATPG tools. Scan cell structures The example circuit in Figure 2 used a scan cell structure called Mux-DFF. This is the most common scan cell structure and is used in DFF-based designs. Instead of using regular DFFs, Mux-DFF based designs use scan cells that have a mux in front of the D input. This scan cell structure is shown in Figure 3. Note, to ensure scan shift safety from random circuit states the scan operation has to have the highest priority of all operations (set, reset, hold) or scan shifting may be disrupted. If the cell has operations with higher priority, then they must be disabled by using an external gate and a control signal (such as SE).

DFT Tutorial

Crouch / Eide / Posse

Appendix A / Page 2

Another scan cell structure also shown in Figure 3 is Level Sensitive Scan Design, LSSD. This scan cell is designed for latch based designs, and uses two non-overlapping clocks for shifting data through the chain.
D SI

D SI SE

SET

Q SO

CL R

SysCLK ScanCLKA ScanCLKB

Figure 3 -- Mux-DFF and LSSD scan cells

The main advantage of LSSD is to eliminate clock skew during shift. The D-mimic model takes this idea one step further. This is a model that also eliminates clock skew problems during capture by using non-overlapping clocks also during capture. This way, one clock can safely be used in test mode. During scan insertion, the D-mimic scan cell replaces nonscan DFFs. The examples in the remaining sections of this document will focus on Mux-DFF type scan cells, since these are most common in use today. Scan chain inversion When analyzing scan chain data, either in an ATPG tool, in a digital simulator, or on the tester, it is important to understand the concept of scan chain inversion. When scan chains are inserted to a design, the scan cell insertion tool will typically use the output of the sequential element with the least load. That means that if for instance for one given DFF, the Q output drives 4 gates and QB drives1, the tool will pick QB as scan output for that particular cell. A typical scan chain configuration is shown in Figure 4.
A B
D
SET

Y
Q D
SET

SI

SET

SO

FF1
CL R

FF2
CL R

FF3
CL R

SE CLK

Figure 4

The inversion in the scan chain is typically handled automatically by scan insertion and ATPG tools. However, in cases where a user has to manually look at scan chain data, it is

DFT Tutorial

Crouch / Eide / Posse

Appendix A / Page 3

important to take this inversion into consideration. For instance, is there is an odd number of inverters between a scan cell and a scan output pin, and this cell captures a 0 for a certain pattern, this bit will appear as a 1 on the scan output pin. This becomes important for both understanding the data driven into a scan chain and observed from a scan chain; and for the case of concatenating scan chains together in certain test modes (scan chains with odd parity invert all of the pass through data). Multiple clocks, clock skew, and lockup latches When a design has multiple clocks in test mode, clock skew can occur between the different domains. The problem can be separated into two issues. Clock skew can occur during shift and during capture. To minimize skew during shift, all scan chains should be ordered such that all flip-flops clocked by one clock domain are grouped together (known as a scan domain). This minimizes the locations where clock skew can occur. Then, to avoid skew completely where the scan/clock domains cross, a lockup latch can be inserted. This is illustrated in Figure 5 note that LL1 is a scan/test-only construct and it has no functional purpose as shown.

A B
D
SET

Y
Q D
SET

SET

SI

SET

SO

SFF1
CLR

SFF2
CLR

LL1
L
CLR

SFF3
CL R

SE CLK1 CLK2

Figure 5 -- Scan chain clocked by two clocks with lockup latch

Usually, clock trees are synthesized with functional operation in mind. For instance, clocks might be pulsed in a specific sequence, and certain clocks might not pulse at the same time. During functional operation, internal clock generators may skew manage the internal clocks going to internal domains. Internal domain issues may still exist during test since the tester may bypass the skew managed clock generator or operate the PLL in a different manner during test. During a scan based test, the clocks might be operated or sequenced in a different way: For ATPG, clocks need to be accessible from primary input pins. For a design with multiple internally generated clocks, this means that each clock must be accessible from a dedicated input pin, or one test clock is used to clock all domains during test.

DFT Tutorial

Crouch / Eide / Posse

Appendix A / Page 4

When data is shifted through the scan chains, all clocks are pulsed at the same frequency and at the same time. This means that if multiple clocks are used within the same scan chain there is a danger of clock skew between the clock domains during shift.

For more information on handling multiple clocks in scan designs, see [Mentor, 2001]. Positive and negative edge triggered clocks If both positive and negative edge triggered clocks exist in the same scan chain, one can end up with similar problems during shift as when a design has multiple clocks. The rule of thumb in this case is to group the leading-edge triggered flops together and the trailing-edge triggered flops together, and then place all the trailing-edge triggered flops in the beginning of the chain (closest to SI), and the leading-edge triggered flops at the end of the chain (closest to SO). The relationship between rising/falling and leading/trailing depends on the polarity of the defined clock pulse. The leading edge is the first edge in a cycle. For an active-high, i.e. 0-1-0 shape clock pulse, the rising edge is the leading edge and the falling edge is the trailing edge. By ordering the chain so that the trailing-edge triggered flops occur first in the chain, these flops will be updated before the leading edge triggered flops, and there is no danger of shoot-through or data-smearing, i.e. that data will shoot through two cells (instead of one) for a single clock pulse. This is illustrated in Figure 6, where it is assumed that CLK is clocked with an active-high clock pulse. If the rising-edge pulse occurs before the falling-edge in the cycle, then the data in SFF2 will not shoot through SFF3.
A B
D
SET

Y
Q D
SET

SI

SET

SO

SFF1
CL R

SFF2
CL R

SFF3
CL R

SE CLK

Figure 6 -- Scan chain clocked by leading and training edge flops

Scan chain ordering For Mux-DFF based scan designs, the scan chains must be correctly ordered to prevent skew during shift. This is necessary independent of how many clocks are used in test mode. Correct scan chain ordering typically includes using data from a physical placement (layout) tool to optimize ordering of scan cells. Several placement tools have the ability

DFT Tutorial

Crouch / Eide / Posse

Appendix A / Page 5

to perform layout based scan chain stitching, or to recommend a scan chain ordering (and then have a separate tool utilize this ordering information). Such reordering typically helps reduce clock skew during shift. If one scan chain contains several different flip-flops and some are clocked by different clocks, or that trigger on different clock edges, most scan insertion tool will group the flip-flops triggered by each domain together, to minimize the risk of clock skew and hold-time violations. Some placement tools that are capable of scan chain reordering do not automatically take this grouping into consideration. Therefore, even though layout based scan chain reordering is recommended, it can introduce the same problem it is supposed to solve if the tool is not set up correctly (for example, nearest-neighbor scan connection with no consideration for clock domains results in a minimum wire-length skew/hold-time nightmare). One possible flow is to have the scan insertion tool insert lockup latches between each group of scan cells (as illustrated in Figure 4). In the placement tool where scan chains can be reordered, it might be possible to define the lockup latch as an "endpoint". Then, one can first reorder the cells between scan input and the lockup latch (which is domain 1), and then the cells between the lockup latch and scan chain output (domain 2). That way, the clock grouping (and correct location of the lockup latch) is preserved. Lockup latches are not an ideal solution should not be used instead of doing normally expected timing and skew management they should only be used where there is risk of a timing problem that is not easily handled. Balancing scan chains The number of cycles per scan patterns depends on the length of the scan chains. For each pattern, the scan chains are loaded, new data is captured into the scan cells, and the chains are unloaded. Each load/unload operation consists of as many cycles (clock pulses) as there are cells in the longest concurrent scan chain (a scan chain actively being used with other scan chains to conduct a test). For instance, assuming that a design has 1000 scan cells and 10 scan chains, if the cells are evenly distributed (i.e. balanced) among the 10 chains, each chain would have 100 scan cells, and it would take 100 cycles to load or unload the chain. If, on the other hand, the longest scan chain is 200 cells, and all the other chains are 89 cells long, it will take 200 cycles to load/unload the chains the short chains will have their vectors padded with Xs to make the tester image into 200 cycles. This will again increase the overall test time (and waste tester memory resources). This implies that to reduce test time, scan chains should be balanced. Furthermore, a design with many (thus shorter) scan chains will require less test time than the same design configured with few (thus longer) scan chains. The number of scan chains for a design is normally limited by the package (pins available to borrow or dedicate for scan) as well as the tester (channels available with memory depth that can handle scan vectors). Issues discussed earlier in this document, such as multiple clocks domains, multiple clock edges, etc. may cause improper scan chain balancing. In some cases, instead of using

DFT Tutorial

Crouch / Eide / Posse

Appendix A / Page 6

lock-up latches, a designer might choose to have maximum one clock per scan chain. This increases the allowed slack over using lock-up latches. A one clock / one edge per chain methodology is safer than using lockup latches, but may result in increased test time.

Multiple clocks and pattern generation The traditional way to avoid clock skew during capture is to pulse only one clock per pattern. During shift, all clocks are pulsed at the same time, but only one clock is selected per pattern, as illustrated in Figure 7. In this example, each scan chain has 2 scan cells, and there is a total of four scan clocks.
Pattern 1 Pattern 2

Start load_unload

shift

shift

capture

Start load_unload

shift

shift

capture

TClk1 TClk2 TClk3 TClk4

Figure 7 Pulsing one clock per pattern

The main disadvantage of this technique is a high pattern count only one clock domain is actively tested by a vector set at any given time. For a design with ten clocks, one can experience up to ten times more patterns than if the design had only one clock. Therefore, most ATPG tools today use one or both of the following two techniques to safely help reduce pattern count and reduce test cost. One methodology flow is to first route all clocks to separate inputs in test mode, and then analyze which clock domains are independent (i.e. have no functional paths between them). This can be between certain domains, or between all domains. After such analysis has been done, the user can tell the ATPG tool to treat multiple clocks as equivalent clocks. That will have the same effect as using one pin for these clock pins. In the example shown in Figure 8, analysis shows that there are no functional paths between clock domains 1 and 3. Therefore, for Pattern 1 clocks TClk1 and TClk3 can be pulsed at the same time. Since there is interaction between other domains, clock TClk2 is pulsed alone for Pattern 2.

DFT Tutorial

Crouch / Eide / Posse

Appendix A / Page 7

Pattern 1 Start load_unload shift shift capture Start load_unload shift

Pattern 2 shift capture

Scan Enable TClk1 TClk2 TClk3 TClk4

Figure 8 Clock domain analysis allows Tclk1 and Tclk3 to be pulsed simultaneously

This technique is the most effective if most of the clock domains are independent. It is practical in use if the ATPG tool used is capable of doing analysis checking for domain independence.

A second approach will result in a compact pattern set without risking clock skew. Here, capture takes place over multiple cycles. Only one clock is pulsed per cycle. This method is illustrated in Figure 9.
Pattern 1 Pattern 2

Start load_unload

shift

shift

capture

capture

Start load_unload

shift

shift

capture

capture

Scan Enable TClk1 TClk2 TClk3 TClk4

Figure 9 Two capture clocks pulsed sequentially in a dual-cycle capture.

This is a safe way to reduce the pattern count and typically results in very good compression results with only a sequential depth of 2 or 3 even for designs with many clocks. Compared to basic non-sequential patterns, the ATPG runtime will be higher due to more samples, but this is minimal compared to the number of shift clocks needed to test multiple clock domains independently.

DFT Tutorial

Crouch / Eide / Posse

Appendix A / Page 8

You might also like