You are on page 1of 70


1.1 Objectives Of The Project :
The main objective is to design an reversible logic gates and implementation of a digital
decoders using reversible logic gates .
Other objectives are given below.
To design both non-reversible and reversible versions of decoders along with analytical
evaluation of the design complexities both in terms of delay and resource requirements.
To optimize the final garbage outputs, constant inputs optimizing techniques are also
we present an efficient reversible implementation of decoders for all digital applications.
The comparative results show that the proposed design is much better in terms of quantum cost,
delay, hardware complexity and has significantly better scalability than the existing approach.

1.2 Motivation:
The present invention relates to an reversible logic gates and implementation of digital
decoders usingreversible logic gates.
The encoded input information need be preserved at the output in computational tasks
pertaining to digital signal processing, communication, computer graphics, and cryptography
applications. The conventional computing circuits are irreversible i.e. the input bits are lost when
the output is generated. This information loss during computation culminates into increased
power consumption. According to Landauer , for each bit of information that is lost during the
computation generates KTln2 Joules of heat energy where K and T respectively represent
Boltzmanns constant and absolute temperature. The information loss becomes more frequent for
high speed systems thereby leading to increased heat energy. C. Bennett demonstrated that
power dissipation in can be significantly reduced the same operation is done reversibly. The
reversible logic allows the circuit to go back at any point of time therefore no bit of information
is actually lost and the amount of heat generated is negligible. In digital design, decoders find
extensive usage - in addressing a particular location for read/write operation in memory cells , in
I/O processors for connecting a memory chips to the CPU ; and also in Analog to Digital (ADC)

and Digital to Analog Converters (DAC) which are used in various different stages of a
communication system. This paper therefore addresses the design of reversible decoders. The
literature survey on reversible decoders shows that the focus is on either developing topology
based on available reversible gates or present an altogether new gate for the said purpose.
The topology employs Double Feynman and Fredkin gates for realization whereas the one
presented,is based on Fredkin gates. The former topology is cost efficient and later has higher
cost metrics. Comparatively larger number of constant inputs and garbage outputs are present. A
new gate RFD is proposed for developing reversible decoders. It, however, has large constant
inputs and garbage outputs. Yet another reversible decoder is presented in which has attractive
cost metrics but it cannot be extended further into a generalized n- input decoder. The reversible
decoders provide only active high mode of operation. This study introduces a reversible decoder
which can provide both active high and active low mode of operation and utilizes Feynman and
Fredkin gates. A comparison in terms of number of constant inputs, quantum cost and the
number of garbage outputs is also given.
1.3 Organization Of Documentation:
This thesis is divided in to six chapters. The chapter wise out line is as follows
Chapter-1 deals with the motivation of reversible logic gates based decoders Implementations.
Chapter-2 is concerned with literature review on different methods and implementations of
Chapter-3 is dealing with the Reversible Logic circuits and basic Reversible Logic Gates.
Chapter-4 Describes the operation of decoders and types of decoders.
Chapter-5 is about the Seed circuit reversible circuits, the operation of reversible decoders and
types of decoders and decoders descriptions.
Chapter-6 Explains the introduction to very large scale integration(VLSI).
Chapter-7 gives the Overview on Verilog Hardware Description Language as we used in our
Chapter-8 Demonstrates that how we can use the Xilinx Software for synthesis and simulation
of our project.

Chapter-9 Describes the Experimental Results of the Proposed Test Pattern Generator followed
by Conclusion and References.

2.1 Introduction:
Reversible logic is becoming a popular emerging paradigm because of its applications in
various emerging technologies, like quantum computing, DNA computing, optical computing,
etc. It is also

considered an alternate low power design methodology. A reversible circuit

consists of a cascade of reversible gates without any fanout or feedback connections, and the
number of inputs and outputs must be equal. There exists various ways by which reversible
circuits can be implemented like NMR technology, optical technology, etc.
In the optical domain, a photon can store information in a signal having zero rest mass and
provide very high speed. These properties of photon have motivated researchers to study and
implement reversible circuits in optical domain.
Theoretically from the decade old principles of the reversible logic is considered as a
potential alternative to low power computing. Optical implementation of reversible gate can be
one possible alternative to overcome the power dissipation problem in conventional computing.
In recent time researchers have investigated various combinational and sequential circuits and
their implementations using reversible logic gates. Also reversible logic gates offer significant
advantages like ease of fabrication, high speed, low power, and fast switching time.
2.2Existing methods:
The Binary Decoder is another combinational logic circuit constructed from individual
logic gates and is the exact opposite to that of an Encoder we looked at in the last tutorial. The
name Decoder means to translate or decode coded information from one format into another,
so a digital decoder transforms a set of digital input signals into an equivalent decimal code at its
Binary Decoders are another type of Digital Logic device that has inputs of 2-bit, 3-bit
or 4-bit codes depending upon the number of data input lines, so a decoder that has a set of two
or more bits will be defined as having an n-bit code, and therefore it will be possible to represent
2n possible values. Thus, a decoder generally decodes a binary value into a non-binary one by
setting exactly one of its n outputs to logic 1.

A Binary Decoder converts coded inputs into coded outputs, where the input and output
codes are different and decoders are available to decode either a Binary or BCD (8421 code)
input pattern to typically a Decimal output code. Commonly available BCD-to-Decimal decoders
include the TTL 7442 or the CMOS 4028. Generally a decoders output code normally has more
bits than its input code and practical binary decoder circuits include, 2-to-4, 3-to-8 and 4-to-16
line configurations.Theexisting method, decoders like of 2:4 and 3:8 decoders are designed by
using conventional gates like and, not, xor etc. using the Boolean expressions.

2.3Proposed method:
Reversible gates have equal number of inputs and outputs; and each of these input output
combination is unique. An n input n output reversible gate is represented as n x n gate. The
inputs which assume value 0 or 1 during the operation are termed as constant inputs. On the
other hand, the number of outputs introduced for maintaining reversibility is called garbage
outputs. Some of the most widely and commonly used reversible gates are Feynman Gate (FG),
Fredkin Gate (FRG), Peres gate (PG) and Toffoli gate (TG). Out of these gates Feynman gate is a
2 x 2 gate while Peres, Toffoli and Fredkin gates belong to 3 x 3 gates. The cost of reversible
gate is given in terms of number of primitive reversible gates needed to realize the circuit.
The development in the field of nanometer technology leads to minimize the power
consumption of logic circuits. Reversible logic design has been one of the promising
technologies gaining greater interest due to less dissipation of heat and low power consumption.
In the digital design, the multiplexer is a widely used process. Reversible logic plays an
extensively important role in low power computing as it recovers from bit loss through unique
mapping between input and output vectors . Power consumption is an important issue in modern
day VLSI designs.
The development in the field of nanometer technology leads to minimize the power
consumption of logic circuits.

Reversible logic design has been one of the promising

technologies gaining greater interest due to less dissipation of heat and low power consumption.
In the digital design, the decoder is a widely used process. So, the reversible logic gates and
reversible circuits for realizing decoders like of 2:4,3:8,4:16 reversible decoder using reversible

logic gates is proposed. The proposed design leads to the reduction of power consumption
compared with conventional logic circuits.

3.1 Reversible computing:
Reversible computing is a model of computing where the computational process to some
extent is reversible, i.e., time-invertible. A necessary condition for reversibility of a
computational model is that the relation of the mapping states of transition functions to their
successors should at all times be one-to-one. Reversible computing is generally considered an
unconventional form of computing.
There are two major, closely related, types of reversibility that are of particular interest
for this purpose: physical reversibility and logical reversibility.
A process is said to be physically reversible if it results in no increase in physical entropy;
it is isentropic. These circuits are also referred to as charge recovery logic or adiabatic
computing. Although in practice no non stationary physical process can be exactly physically
reversible or isentropic, there is no known limit to the closeness with which we can approach
perfect reversibility, in systems that are sufficiently well-isolated from interactions with
unknown external environments, when the laws of physics describing the system's evolution are
precisely known.
Probably the largest motivation for the study of technologies aimed at actually
implementing reversible computing is that they offer what is predicted to be the only potential
way to improve the energy efficiency of computers beyond the fundamental von NeumannLandauer limit of kT ln(2) energy dissipated per irreversible bit operation.
As was first argued by Rolf Landauer of IBM, in order for a computational process to be
physically reversible, it must also be logically reversible. Landauer's principle is the loosely
formulated notion that the erasure of n bits of information must always incur a cost of nk ln(2) in
thermodynamic entropy. A discrete, deterministic computational process is said to be logically
reversible if the transition function that maps old computational states to new ones is a one-toone function; i.e. the output logical states uniquely defines the input logical states of the
computational operation.

For computational processes that are nondeterministic (in the sense of being probabilistic
or random), the relation between old and new states is not a single-valued function, and the
requirement needed to obtain physical reversibility becomes a slightly weaker condition, namely
that the size of a given ensemble of possible initial computational states does not decrease, on
average, as the computation proceeds.

3.2 The reversibility of physics and reversible computing

Landauer's principle (and indeed, the second law of thermodynamics itself) can also be
understood to be a direct logical consequence of the underlying reversibility of physics, as is
reflected in the general Hamiltonian formulation of mechanics and in the unitary time-evolution
operator of quantum mechanics more specifically.
In the context of reversible physics, the phenomenon of entropy increase (and the
observed arrow of time) can be understood to be consequences of the fact that our evolved
predictive capabilities are rather limited, and cannot keep perfect track of the exact reversible
evolution of complex physical systems, especially since these systems are never perfectly
isolated from an unknown external environment, and even the laws of physics themselves are
still not known with complete precision. Thus, we (and physical observers generally) always
accumulate some uncertainty about the state of physical systems, even if the system's true
underlying dynamics is a perfectly reversible one that is subject to no entropy increase if viewed
from a hypothetical omniscient perspective in which the dynamical laws are precisely known.
The implementation of reversible computing thus amounts to learning how to
characterize and control the physical dynamics of mechanisms to carry out desired computational
operations so precisely that we can accumulate a negligible total amount of uncertainty regarding
the complete physical state of the mechanism, per each logic operation that is performed. In
other words, we would need to precisely track the state of the active energy that is involved in
carrying out computational operations within the machine, and design the machine in such a way
that the majority of this energy is recovered in an organized form that can be reused for
subsequent operations, rather than being permitted to dissipate into the form of heat.
Although achieving this goal presents a significant challenge for the design,
manufacturing, and characterization of ultra-precise new physical mechanisms for computing,
there is at present no fundamental reason to think that this goal cannot eventually be
accomplished, allowing us to someday build computers that generate much less than 1 bit's worth

of physical entropy (and dissipate much less than kT ln 2 energy to heat) for each useful logical
operation that they carry out internally.
The motivation behind much of the research that has been done in reversible computing
was the first seminal paper on the topic, which was published by Charles H. Bennett of IBM
research in 1973. Today, the field has a substantial body of academic literature behind it. A wide
variety of reversible device concepts, logic gates, electronic circuits, processor architectures,
programming languages, and application algorithms have been designed and analyzed by
physicists, electrical engineers, and computer scientists.
This field of research awaits the detailed development of a high-quality, cost-effective,
nearly reversible logic device technology, one that includes highly energy-efficient clocking and
synchronization mechanisms. This sort of solid engineering progress will be needed before the
large body of theoretical research on reversible computing can find practical application in
enabling real computer technology to circumvent the various near-term barriers to its energy
efficiency, including the von Neumann-Landauer bound. This may only be circumvented by the
use of logically reversible computing, due to the Second Law of Thermodynamics.

3.3 Reversible circuits

To implement reversible computation, estimate its cost, and to judge its limits, it is
formalized it in terms of gate-level circuits. For example, the inverter (logic gate) (NOT) gate is
reversible because it can be undone. The exclusive or (XOR) gate is irreversible because its
inputs cannot be unambiguously reconstructed from an output value. However, a reversible
version of the XOR gatethe controlled NOT gate (CNOT)can be defined by preserving one
of the inputs. The three-input variant of the CNOT gate is called the Toffoli gate. It preserves two
of its inputs a,b and replaces the third c by c\oplus (a\cdot b). With c=0, this gives the AND
function, and with a\cdot b=1 this gives the NOT function. Thus, the Toffoli gate is universal and
can implement any reversible Boolean function (given enough zero-initialized ancillary bits).
More generally, reversible gates have the same number of inputs and outputs. A reversible circuit
connects reversible gates without fanouts and loops. Therefore, such circuits contain equal
numbers of input and output wires, each going through an entire circuit.
Reversible logic circuits have been first motivated in the 1960s by theoretical
considerations of zero-energy computation as well as practical improvement of bit-manipulation
transforms in cryptography and computer graphics. Since the 1980s, reversible circuits have

attracted interest as components of quantum algorithms, and more recently in photonic and nanocomputing technologies where some switching devices offer no signal gain.Surveys of reversible
circuits, their construction and optimization as well as recent research challenges is available.

3.4 Some of reversible gates:

Feynman / CNOT Gate:

The Reversible 2*2 gate with Quantum Cost of one having mapping input (A, B) to
output (P = A, Q= A^B) .

Fig 3.1 Reversible Feynman/CNOT gate (FG)



3.1Truth Table For CNOT Gate

Fredkin gate
The Fredkin gate (also CSWAP gate) is a computational circuit suitable for reversible
computing, invented by Ed Fredkin. It is universal, which means that any logical or arithmetic
operation can be constructed entirely of Fredkin gates. The Fredkin gate is the three-bit gate that
swaps the last two bits if the first bit is 1.


The basic Fredkin gate[1] is a 3*3 Fredkin gate. The input vector is I (A, B, C) and the
output vector is O (P, Q, R). The output is defined by P=A, Q=AB AC and R=AC AB.
Quantum cost
of a Fredkin gate is 5.

Fig 3.2 Reversible Fredkin Gate

The Fredkin gate is the reversible three-bit gate that swaps the last two bits if the first bit is 1.
Truth table

Matrix form



3.2 Truth Table For Fredkin Gate

It has the useful property that the numbers of 0s and 1s are conserved throughout, which
in the billiard ball model means the same number of balls are output as input. This corresponds
nicely to the conservation of mass in physics, and helps to show that the model is not wasteful.In
computer science, the Toffoli gate (also CCNOT gate), invented by Tommaso Toffoli, is a


universal reversible logic gate, which means that any reversible circuit can be constructed from
Toffoli gates. It is also known as the controlled-controlled-not gate, which describes its action.
Any reversible gate must have the same number of input and output bits, by the
pigeonhole principle. For one input bit, there are two possible reversible gates. One of them is
NOT. The other is the identity gate which maps its input to the output unchanged. For two input
bits, the only non-trivial gate is the controlled NOT gate which XORs the first bit to the second
bit and leaves the first bit unchanged.
Unfortunately, there are reversible functions that cannot be computed using just those gates. In
other words, the set consisting of NOT and XOR gates is not universal. If we want to compute an
arbitrary function using reversible gates, we need another gate. One possibility is the Toffoli
gate, proposed in 1980 by Toffoli.This gate has 3-bit inputs and outputs. If the first two bits are
set, it flips the third bit. The following is a table of the input and output bits:
Truth table

Permutation matrix form



3.3 Truth Table Of Toffoli Gate In Permutation Form

It can be also described as mapping bits a, b and c to a, b and c XOR (a AND b).
The Toffoli gate is universal; this means that for any Boolean function f(x1, x2, ..., xm),
there is a circuit consisting of Toffoli gates which takes x1, x2, ..., xm and some extra bits set to 0
or 1 and outputs x1, x2, ..., xm, f(x1, x2, ..., xm), and some extra bits (called garbage).


Essentially, this means that one can use Toffoli gates to build systems that will perform any
desired Boolean function computation in a reversible manner.

Fig.3.3 Reversible Toffoli Gate


4.1 Introductions to decoders


In digital electronics, a binary decoder is a combinational logic circuit that converts a binary
integer value to an associated pattern of output bits. They are used in a wide variety of
applications, including data de-multiplexing, seven segment displays, and memory address
There are several types of binary decoders, but in all cases a decoder is an electronic circuit with
multiple data inputs and multiple outputs that converts every unique combination of data input
states into a specific combination of output states. In addition to its data inputs, some decoders
also have one or more "enable" inputs. When the enable input is negated (disabled), all decoder
outputs are forced to their inactive states.
Depending on its function, a binary decoder will convert binary information from n input signals
to as many as 2n unique output signals. Some decoders have less than 2 n output lines; in such
cases, at least one output pattern will be repeated for different input values.

4.1.1 1-of-n decoder:

A 1-of-n binary decoder has n output bits, and the integer inputs bits serve as the "address" or bit
number of the output bit that is to be activated. This type of decoder asserts exactly one of its n
output bits, or none of them, for every unique combination of input bit states. Each output bit
becomes active only when a specific, corresponding integer value is applied to the inputs.

4.1.2 1-to-2 line decoder:

A decoder is a circuit that changes a code into a set of signals. It is called a decoder because it
does the reverse of encoding, but we will begin our study of encoders and decoders with
decoders because they are simpler to design.

A common type of decoder is the line decoder which takes an n-digit binary number and
decodes it into 2n data lines. The simplest is the 1-to-2 line decoder. The truth table is


4.1Truth table 1to2 decoder

A is the address and D is the dataline. D0 is NOT A and D1 is A. The circuit looks like

Fig.4.1 circuit diagram of 1to2 decoder

4.1.3 2-to-4 line decoder:

Only slightly more complex is the 2-to-4 line decoder. The truth table is


4.2Truth table 2to4 decoder

Developed into a circuit it looks like

Fig.4.2 circuit diagram of 2to4 decoder

Larger line decoders can be designed in a similar fashion, but just like with the binary adder
there is a way to make larger decoders by combining smaller decoders. An alternate circuit for
the 2-to-4 line decoder is


Fig.4.3An alternate circuit for the 2-to-4 line decoder

Replacing the 1-to-2 Decoders with their circuits will show that both circuits are equivalent. In
a similar fashion a 3-to-8 line decoder can be made from a 1-to-2 line decoder and a 2-to-4 line
decoder, and a 4-to-16 line decoder can be made from two 2-to-4 line decoders.
You might also consider making a 2-to-4 decoder ladder from 1-to-2 decoder ladders. If you
do it might look something like this:


Fig.4.4 2-to-4 decoder ladder from 1-to-2 decoder ladders

For some logic it may be required to build up logic like this. For an eight-bit adder we only
know how to sum eight bits by summing one bit at a time. Usually it is easier to design ladder
logic from boolean equations or truth tables rather than design logic gates and then translate
that into ladder logic.


5.1 Introduction
Reversible logic plays an extensively important role in low power computing as it
recovers from bit loss through unique mapping between input and output vectors. No bit loss
property of reversible circuitry results less power dissipation than the conventional one .
Moreover, it is viewed as a special case of quantum circuit as quantum evolution must be
reversible . Over the last two decades, reversible circuitry gained remarkable interests in the eld
of DNA-technology , nano-technology , optical computing , program debugging and testing,
quantum dot cellular automata , and discrete event simulation and in the development of highly
efcient algorithms . On the other hand, parity checking is a popular mechanisms for detecting
single level fault. If the parity of the input data is maintained throughout the computation, then
intermediate checking wouldnt be required and an entire circuit can preserve parity if its
individual gate is parity preserving . Reversible fault tolerant circuit based on reversible fault
tolerant gates allows to detect faulty signal in the primary outputs of the circuit through parity
checking . Hardware of digital communication systems relies heavily on decoders as it retrieve
information from the coded output. Decoders have also been used in the memory and I/O of
micro processors . In a reversible fault tolerant decoder was designed, but it was not generalized
and compact. Therefore, this paper investigates the generalized design methodologies of
reversible fault tolerant decoders.

5.2 Basic definitions and literature review

This section formally denes reversible gate, garbage output, delay, hardware complexity
and presents popular reversible fault tolerant gates along with their input-output specications,
transistor and quantum equivalent representations.

5.2.1. Reversible and Fault Tolerant Gates

An nn reversible gate is a data stripe block that uniquely maps between input vector Iv =

I0, I1, ..., In1) and output vector Ov = (O0, O1, . . . , On1)denoted as IvOv. Two prime

requirements for the reversible logic circuit are as follows [14]:


There should be


number of inputs and outputs.

There should be one-to-one correspondence between inputs

and outputs for all possible input-output sequences. A Fault tolerant gate is a reversible gate that
constantly preserves same parity between input and output vectors. Morespecically, an n n
fault tolerant gate clarify the following property between the input and output vectors [12]:
I0 I1

... In1

= O0 O1 ... On1 (1)

Parity preserving property of Eq.1 allows to detect a faulty signal from the circuits primary
output. Researchers [11], [12], [15] have showed that the circuit consist of only reversible fault
tolerant gates preserves parity and thus able to detect the faulty signal at its primary output.

5.2.2. Qubit and Quantum Cost

The main difference between the qubits

and conventional bits is that, qubits can form linear

combination of states |0> or |1> called superposition, while the basic states |0> or |1> are an
orthogonal basis of two-dimensional complex vector [3]. A superposition can be denoted as,|
>=|0>+|1>, which means the probability of particle being measured instates 0 is
results 1 with probability||2, and ofcourse ||2


||2, or

= 1 [16]. Thus, information

stored by a qubit are different when given different and . Because of such properties, qubits
can perform certain calculations expo nentially faster than conventional bits. This is one of the
main motivation behind the quantum computing. Quantum computer demands its underneath
circuitry be reversible [1][6]. The quantum cost for all 11 and 2 2 reversible gates are
considered as 0 and 1, respectively [6][14]. Hence, quantum cost of a reversible gate or circuit
is the total number of 2 2 quantum gate used in that reversible gate or circuit.

5.2.3. Delay, Garbage Output and Hardware Complexity

The delay of a circuit is the delay of the critical path. The path with maximum number of
gates from any input to any output is the critical path [1]. There may be more than one critical
path in a circuit and it is an NP-complete problem to nd all the critical paths [17]. So,
researchers pick the path which is the most likely candidates for the critical paths [18]. Unused
output of a reversible gate (or circuit) is known as garbage output, i.e., the output which are
needed only to maintain the reversibility are the garbage output. The number of basic operations
(Ex-OR, AND, NOT etc.) needed to realize the circuit is referred to as thehardware complexityof
the circuit. Actually, a constant complexity is assumed for each basic operation of the circuit,
such as, for Ex-OR, for AND, for NOT etc. Then, total number of operations are calculated
in terms of , , and .

5.2.4. Popular Reversible Fault Tolerant Gates

1) Feynman Double Gate:
Input vector (Iv) and output vector (Ov) for 3 3 reversible Feynman double gate (F 2G) is
dened as follows [19]:Iv= (a, b, c)andOv=(a, a b, a c). Block diagram of F 2G is shown in
Fig.5.1(a).Fig.5.1(b) represent the quantum equivalent realization of F 2G.From Fig.5.1(b) we
nd that it is realized with two 22 Ex-ORgate, thus its quantum cost is two (Sec. II-B).
According to our design procedure, twelve transistors are required to realize F 2G reversibly as
shown in Fig.5.1(c). Fig.5.3(a) represents the corresponding timing diagram of F 2G.

Fig.5.1: Reversible Feynman double gate (a) Block diagram (b) Quantum equivalent
realization (c) Transistor realization


The input and output vectors for 3 3 Fredkin gate (F RG) are dened as follows [20]: Iv = (a, b,
c) and Ov= (a, a b ac, a c ab). Block diagram of F RG is shown in Fig.5.2 (a). Fig. 2(b)
represents the quantum realization of F RG. In Fig.5.2 (b), each rectangle is equivalent to 2 2
quantum primitives, therefore its quantum cost is considered as one [13]. Thus total quantum
cost of F RG is ve. To realize the F RG, four transistors are needed as shown in Fig.5.2(c) and
its corresponding timing diagram is shown in Fig.5.3 (b).

Fig.5.2: Reversible Fredkin gate (a) Block diagram (b) Quantumequivalent realization
(c) Transistor realization
Reversible Fredkin and Feynman double gate obey the rule of Eq.1. The fault tolerant (parity
preserving) roperty of Fredkin and Feynman double is shown in Table 5.1.


TABLE 5.1: Truth table for F2G and FRG

5.2.5 Decoder
Decoders are the collection of logic gates xed up in a specic way such that, for an input
combination, all outputs terms are low except one. These terms are the minterms. Thus, when an
input combination changes, two outputs will change. Let, there are n inputs, so number of
outputs will be 2n. There are several designs of reversible decoders in the literature. To the best
of out knowledge, the designs from [7] is the only reversible design that preserve parity too.

Fig.5.3: Simulation results for (a) Feynman double gate (b) Fredkin gate.

5.3. Proposed Reversible Fault Tolerant Decoder

Considering the simplest case,n=1, we have a 1-to-2 decoder. Only a F 2G can work as 1-to2Reversible Fault tolerant Decoder (RFD) as shown in Fig.5.4(a) and its corresponding timing
diagram is shown in Fig.5.4(b). From now on, we denote a reversible fault tolerant decoder as

Fig.5.4: Proposed 1-to-2 RFD (a) Architecture (b) Simulationwith result

Fig.5.5(a) and Fig.5.5(d) represent the architecture of 2-to-4 and 3-to-8 RFD, respectively.
Timing diagram of Fig.5.5(a) is shown in Fig.5.5(c). From Fig.5.5(d), we nd that 3-to-8 RFD is
designed using 2-to-4 RFD, thus a schema of Fig.5.5(a) is created which is shown in Fig.5.5(b).
Algorithm 1 presents the design procedure of the proposedn-to-2nRFD. Primary input to the
algorithm aren control bits. Line 6 of the proposed algorithm assigns the input to the Feynman
double gate for the rst control bit (S0), whereas line 9 assigns rst two inputs to the Fredkin


gates for all the remaining control bits. Line 10-12 assign third input to the Fredkin gate for n =2


Fig.5.5: (a) Block diagram of the proposed 2-to-4 RFD. (b)Schematic diagram of 2-to-4
RFD. (c) Simulation with DSCH-2.7 [21] of 2-to-4 RFD. (d) Block diagram of the proposed
3-to-8 RFD. (e) Simulation with DSCH-2.7 [21] of 3-to-8 RFD

While line 13-15 assigns third input to the Fredkin gate through a recursive call to previous RFD
forn>2. Line 18-19 returns outputs. The complexity of this algorithm is O(n). According to the
proposed algorithm architecture of n-to-2nRFD is shown in Fig.5.5(e). In Sec. II-D, we present
the transistor representations of F RGand F 2Gusing MOS transistors. These representations are
nally used to get the MOS circuit of the proposed decoder. Each of the proposed circuit are
simulated with DSCH-2.7 [21]. This simulationsalso show the functional correctness of the
proposed decoders. Table. II shows a comparative study of the proposed fault tolerant decoders
with existing fault tolerant one.

TABLE 5.2: Comparison of reversible fault tolerant decoders

GT = No of Gate, GO = Garbage Output, QC = Quantum cost,

HC = Hardware Complexity, UD = Unit Delay.
* The design is not generalized one, i.e., it is not an n-to-2n decoder.
TABLE 5.3: 1-to-2 decoder with 1 constant input

Next, we must prove the existence of combinational circuit which can realize the reversible fault
tolerant 1-to-2 decoder by 2 constant inputs. This can easily be accomplished by the circuit
shown in Fig.5.4(a). It can be verified that Fig.5.4(a) is reversible and fault tolerant with the help
of its corresponding truth table, there is no need to give more detail. Now, in 1-to-2 reversible
fault tolerant decoder there are at least 2 constant inputs and 1 primary input, i.e., total of 3
inputs. Thus, 1-to-2 reversible fault tolerant decoder should have at least 3 outputs, otherwise it
will never comply with the properties of reversible parity preserving circuit. Among these 3


outputs, only 2 are primary outputs. So, remaining 1 output is the garbage output, which holds
Theorem 1 for n=1.
Theorem 2: A 2-to-4 reversible fault tolerant decoder can be realized with at least 12 quantum
Proof: A 2-to-4 decoder has 4 different 22 logical AND operations. A reversible fault tolerant
AND2 operation requires at least 3 quantum cost. So, 2-to-4 reversible fault tolerant decoder is
realized with at least 12 quantum cost.
Example 2: Fig.5.5(a) is the proof for the existence of 2-to- 4 reversible decoder with 12
quantum cost. Next, we want to prove that it is not possible to realize a reversible fault tolerant
2-to-4 decoder fewer than 12 quantum cost. In the 2-to-4 decoder, there are 4 different 22
logical AND operations, e.g., S1S0, S1S0, S1S 0, S1S0. It will be enough if we prove that it is
not possible to realize a reversible fault tolerant 22 logical AND with fewer than three quantum
cost. Consider,
i .If we make use of one quantum cost to design the AND, that of course is not possible
according to our discussion in Sec. II.
ii .If we make use of two quantum cost to design AND, then we must make use of two 1 1 or 2
2 gates. Apparently two 1 1 gates cant generate the AND. Aiming at two 2 2 gates, we
have two combinations, which are shown in Fig.5.6 (a) and Fig.5.6 (b). In Fig.5.6 (a), the output
must be (a, ab) if the inputs are (a, b). The corresponding truth table is shown in Table.5.4.

Fig.5.6: Combinations of the two 2 2 quantum primitive gates

TABLE 5.4: Truth table of Fig. 5.6(a)


From Table. 5.4, we nd that, outputs are not at all unique to its corresponding input
combinations (1st and 2nd rows have the identical outputs for different input combinations). So
it cant achieve the reversible AND. For Fig.5.6(b) if inputs are (a, b, c) then, the outputs of the
lower level will be offered to the next level as a controlled input, this means that second output
of Fig.5.6(b) have to be ab, otherwise it will never be able to get output ab since third output of
Fig.5.6(b) is controlled by the second output, thereby according to Table 5.5, we can assert that
the second combination is impossible to realize the AND no matter how we set the third output
of Fig.5.6(b)
TABLE 5.5: Truth table of Fig. 5.6(b)

(third column of Table.5.5), the input vectors will never be one-to-one correspondent with the
output vectors. Therefore, we can conclude that, a combinational circuit for reversible fault
tolerant 22 logical AND operation cant be realized with less than three quantum cost. The
above example claries the lower bound in terms of quantum cost of 2-to-4 RFD. Similarly, it
can be proved that the n-to-2n RFD can be realized with 5(2n-8 5) quantum cost, when n1, and
by assigning different values to n, the validity of this equation can be proved.
Lemma 1: An n-to-2n RFD can be realized with (2n- 1) reversible fault tolerant gates, where n is
the number of data bits.


Proof: According to our design procedure, an n-to-2n RFD requires an (n 1)-to-2n1 RFD plus
n number of Fredkin gates, which requires an (n2)-to-2n2 RFD plus (n1) Frdekin gates and
so on till we reach to 1-to-2 RFD. 1-to-2 RFD requires a reversible fault tolerant Feynman
double gate only. Thus total number of gates required for an n-to-2n RFD is,

Example 3: From Fig.5.5(d) we nd that the proposed 3-to-8 RFD requires total number of 7
reversible fault tolerant gates. If we replace n with 3 in Lemma 1, we get the value 7 as well.
Lemma 2: Let, , , be the hardware complexity for a two-input Ex-OR, AND and NOT
operation, respectively. Then an n-to-2n RFD can be realized with (2n+1 2)+ (2n+2 8) +
(2n+1 4)) hardware complexity, where n is the number of data bits.
Proof: In Lemma 1, we proved that an n-to-2n RFD is realized with a F2G and (2n 2) FRG.
Hardware complexity of a FRG and a F2G are 2+4 +2 and 2, respectively. Hence, hardware
complexity for n-to-2n RFD is

Example 4: Fig.5.5(d) shows that the proposed 3-to-8 reversible fault tolerant decoder requires
six Fredkin gates and one Feynman double gate. According to our previous discussion in Sec. II,
hardware complexity of a Feynman double gate is 2, whereas, hardware complexity of a
Fredkin gate is 2 +4 +2 . Thus, the hardware complexity of Fig. 5(d) is 6(2 +4 +2)+2 =
14 + 24 + 12. In Lemma 2, if we put n =3, we get exactly 14 + 24 + 12 as well.


6.1 Very-large-scale integration
Very-large-scale integration (VLSI) is the process of creating integrated circuits by combining
thousands of transistors into a single chip. VLSI began in the 1970s when complex
semiconductor and communication technologies were being developed. The microprocessor is a
VLSI device.

Fig6.1 A VLSI integrated-circuit die

6.2 History


During the 1920s, several inventors attempted devices that were intended to control the current
in solid state diodes and convert them into triodes. Success, however, had to wait until after
World War II, during which the attempt to improve silicon and germanium crystals for use as
radar detectors led to improvements both in fabrication and in the theoretical understanding of
the quantum mechanical states of carriers in semiconductors and after which the scientists who
had been diverted to radar development returned to solid state device development. With the
invention of transistors at Bell labs, in 1947, the field of electronics got a new direction which
shifted from power consuming vacuum tubes to solid state devices.With the small and effective
transistor at their hands, electrical engineers of the 50s saw the possibilities of constructing far
more advanced circuits than before. However, as the complexity of the circuits grew, problems
started arising.
Another problem was the size of the circuits. A complex circuit, like a computer, was dependent
on speed. If the components of the computer were too large or the wires interconnecting them too
long, the electric signals couldn't travel fast enough through the circuit, thus making the
computer too slow to be effective.
Jack Kilby at Texas Instruments found a solution to this problem in 1958. Kilby's idea was to
make all the components and the chip out of the same block (monolith) of semiconductor
material. When the rest of the workers returned from vacation, Kilby presented his new idea to
his superiors. He was allowed to build a test version of his circuit. In September 1958, he had his
first integrated circuit ready. Although the first integrated circuit was pretty crude and had some
problems, the idea was groundbreaking. By making all the parts out of the same block of
material and adding the metal needed to connect them as a layer on top of it, there was no more
need for individual discrete components. No more wires and components had to be assembled
manually. The circuits could be made smaller and the manufacturing process could be automated.
From here the idea of integrating all components on a single silicon wafer came into existence
and which led to development in Small Scale Integration(SSI) in early 1960s, Medium Scale
Integration(MSI) in late 1960s, Large Scale Integration(LSI) and in early 1980s VLSI 10,000s of
transistors on a chip (later 100,000s & now 1,000,000s).


6.3 Developments
The first semiconductor chips held two transistors each. Subsequent advances added more and
more transistors, and, as a consequence, more individual functions or systems were integrated
over time. The first integrated circuits held only a few devices, perhaps as many as ten diodes,
transistors, resistors and capacitors, making it possible to fabricate one or more logic gates on a
single device.Now known retrospectively as small-scale integration (SSI), improvements in
technique led to devices with hundreds of logic gates, known as medium-scale integration (MSI).
Further improvements led to large-scale integration (LSI), i.e. systems with at least a thousand
logic gates. Current technology has moved far past this mark and today's microprocessors have
many millions of gates and billions of individual transistors.
At one time, there was an effort to name and calibrate various levels of large-scale integration
above VLSI. Terms like ultra-large-scale integration (ULSI) were used. But the huge number of
gates and transistors available on common devices has rendered such fine distinctions moot.
Terms suggesting greater than VLSI levels of integration are no longer in widespread use.
As of early 2008, billion-transistor processors are commercially available. This is expected to
become more commonplace as semiconductor fabrication moves from the current generation of
65 nm processes to the next 45 nm generations (while experiencing new challenges such as
increased variation across process corners). A notable example is Nvidia's280 series GPU. This
GPU is unique in the fact that almost all of its 1.4 billion transistors are used for logic, in contrast
to the Itanium, whose large transistor count is largely due to its 24 MB L3 cache. Current
designs, unlike the earliest devices, use extensive design automation and automated logic
synthesis to lay out the transistors, enabling higher levels of complexity in the resulting logic
functionality. Certain high-performance logic blocks like the SRAM (Static Random Access
Memory) cell, however, are still designed by hand to ensure the highest efficiency (sometimes by
bending or breaking established design rules to obtain the last bit of performance by trading
stability). VLSI technology is moving towards radical level miniaturization with introduction of
NEMS technology. Alot of problems need to be sorted out before the transition is actually made.


6.4 Structured design

Structured VLSI design is a modular methodology originated by Carver Mead and Lynn Conway
for saving microchip area by minimizing the interconnect fabrics area. This is obtained by
repetitive arrangement of rectangular macro blocks which can be interconnected using wiring by
abutment. An example is partitioning the layout of an adder into a row of equal bit slices cells. In
complex designs this structuring may be achieved by hierarchical nesting.
Structured VLSI design had been popular in the early 1980s, but lost its popularity later because
of the advent of placement and routing tools wasting a lot of area by routing, which is tolerated
because of the progress of Moore's Law. When introducing the hardware description language
KARL in the mid' 1970s, Reiner Hartenstein coined the term "structured VLSI design"
(originally as "structured LSI design"), echoing EdsgerDijkstra'sstructured


approach by procedure nesting to avoid chaotic spaghetti-structured programs.

6.4.1 Challenges
As microprocessors become more complex due to technology scaling, microprocessor designers
have encountered several challenges which force them to think beyond the design plane, and
look ahead to post-silicon:

Power usage/Heat dissipation As threshold voltages have ceased to scale with

advancing process technology, dynamic power dissipation has not scaled proportionally.
Maintaining logic complexity when scaling the design down only means that the power
dissipation per area will go up. This has given rise to techniques such as dynamic voltage
and frequency scaling (DVFS) to minimize overall power.

Process variation As photolithography techniques tend closer to the fundamental laws

of optics, achieving high accuracy in doping concentrations and etched wires is becoming
more difficult and prone to errors due to variation. Designers now must simulate across
multiple fabrication process corners before a chip is certified ready for production.


Stricter design rules Due to lithography and etch issues with scaling, design rules for
layout have become increasingly stringent. Designers must keep ever more of these rules
in mind while laying out custom circuits. The overhead for custom design is now
reaching a tipping point, with many design houses opting to switch to electronic design
automation (EDA) tools to automate their design process.

Timing/design closure As clock frequencies tend to scale up, designers are finding it
more difficult to distribute and maintain low clock skew between these high frequency
clocks across the entire chip. This has led to a rising interest in multicore and
multiprocessor architectures, since an overall speedup can be obtained by lowering the
clock frequency and distributing processing.

First-pass success As die sizes shrink (due to scaling), and wafer sizes go up (to lower
manufacturing costs), the number of dies per wafer increases, and the complexity of
making suitable photomasks goes up rapidly. A mask set for a modern technology can
cost several million dollars. This non-recurring expense deters the old iterative
philosophy involving several "spin-cycles" to find errors in silicon, and encourages firstpass silicon success. Several design philosophies have been developed to aid this new
design flow, including design for manufacturing (DFM), design for test (DFT), and
Design for X.

6.5 VLSI Technology

Gone are the days when huge computers made of vacuum tubes sat humming in entire dedicated
rooms and could do about 360 multiplications of 10 digit numbers in a second. Though they
were heralded as the fastest computing machines of that time, they surely dont stand a chance
when compared to the modern day machines. Modern day computers are getting smaller, faster,
and cheaper and more power efficient every progressing second. But what drove this change?
The whole domain of computing ushered into a new dawn of electronic miniaturization with the
advent of semiconductor transistor by Bardeen (1947-48) and then the Bipolar Transistor by
Shockley (1949) in the Bell Laboratory.


Since the invention of the first IC (Integrated Circuit) in the form of a Flip Flop by Jack Kilby in
1958, our ability to pack more and more transistors onto a single chip has doubled roughly every
18 months, in accordance with the Moores Law. Such exponential development had never been
seen in any other field and it still continues to be a major area of research work.

Fig 6.2 A comparison: First Planar IC (1961) and Intel Nehalem Quad Core Die

6.6 History & Evolution of VLSI Technology

The development of microelectronics spans a time which is even lesser than the average life
expectancy of a human, and yet it has seen as many as four generations. Early 60s saw the low
density fabrication processes classified under Small Scale Integration (SSI) in which transistor
count was limited to about 10. This rapidly gave way to Medium Scale Integration in the late
60s when around 100 transistors could be placed on a single chip.
It was the time when the cost of research began to decline and private firms started entering the
competition in contrast to the earlier years where the main burden was borne by the military.
Transistor-Transistor logic (TTL) offering higher integration densities outlasted other IC families
like ECL and became the basis of the first integrated circuit revolution. It was the production of
this family that gave impetus to semiconductor giants like Texas Instruments, Fairchild and
National Semiconductors. Early seventies marked the growth of transistor count to about 1000
per chip called the Large Scale Integration.

By mid eighties, the transistor count on a single chip had already exceeded 1000 and hence came
the age of Very Large Scale Integration or VLSI. Though many improvements have been made
and the transistor count is still rising, further names of generations like ULSI are generally
avoided. It was during this time when TTL lost the battle to MOS family owing to the same
problems that had pushed vacuum tubes into negligence, power dissipation and the limit it
imposed on the number of gates that could be placed on a single die.
The second age of Integrated circuit revolution started with the introduction of the first
microprocessor, the 4004 by Intel in 1972 and the 8080 in 1974. Today many companies like
Texas Instruments, Infineon, Alliance Semiconductors, Cadence, Synopsys, Celox Networks,
Cisco, Micron Tech, National Semiconductors, ST Microelectronics, Qualcomm, Lucent, Mentor
Graphics, Analog Devices, Intel, Philips, Motorola and many other firms have been established
and are dedicated to the various fields in "VLSI" like Programmable Logic Devices, Hardware
Descriptive Languages, Design tools, Embedded Systems etc.

VLSI Design:
VLSI chiefly comprises of Front End Design and Back End design these days. While front end
design includes digital design using HDL, design verification through simulation and other
verification techniques, the design from gates and design for testability, backend design
comprises of CMOS library design and its characterization. It also covers the physical design and
fault simulation.
While Simple logic gates might be considered as SSI devices and multiplexers and parity
encoders as MSI, the world of VLSI is much more diverse. Generally, the entire design
procedure follows a step by step approach in which each design step is followed by simulation
before actually being put onto the hardware or moving on to the next step. The major design
steps are different levels of abstractions of the device as a whole:

Problem Specification: It is more of a high level representation of the system. The major

parameters considered at this level are performance, functionality, physical dimensions,

fabrication technology and design techniques. It has to be a tradeoff between market

requirements, the available technology and the economical viability of the design. The end
specifications include the size, speed, power and functionality of the VLSI system.

Architecture Definition: Basic specifications like Floating point units, which system to

use, like RISC (Reduced Instruction Set Computer) or CISC (Complex Instruction Set
Computer), number of ALUs cache size etc.

Functional Design: Defines the major functional units of the system and hence facilitates

the identification of interconnect requirements between units, the physical and electrical
specifications of each unit. A sort of block diagram is decided upon with the number of inputs,
outputs and timing decided upon without any details of the internal structure.

Logic Design: The actual logic is developed at this level. Boolean expressions, control

flow, word width, register allocation etc. are developed and the outcome is called a Register
Transfer Level (RTL) description. This part is implemented either with Hardware Descriptive
Languages like VHDL and/or Verilog. Gate minimization techniques are employed to find the
simplest, or rather the smallest most effective implementation of the logic.

Circuit Design: While the logic design gives the simplified implementation of the logic,the

realization of the circuit in the form of a netlist is done in this step. Gates, transistors and
interconnects are put in place to make a netlist. This again is a software step and the outcome is
checked via simulation.

Physical Design: The conversion of the netlist into its geometrical representation is done in

this step and the result is called a layout. This step follows some predefined fixed rules like the
lambda rules which provide the exact details of the size, ratio and spacing between components.
This step is further divided into sub-steps which are:
6.1 Circuit Partitioning: Because of the huge number of transistors involved, it is not possible
to handle the entire circuit all at once due to limitations on computational capabilities and


memory requirements. Hence the whole circuit is broken down into blocks which are
6.2 Floor Planning and Placement: Choosing the best layout for each block from partitioning
step and the overall chip, considering the interconnect area between the blocks, the exact
positioning on the chip in order to minimize the area arrangement while meeting the performance
constraints through iterative approach are the major design steps taken care of in this step.
6.3 Routing: The quality of placement becomes evident only after this step is completed.
Routing involves the completion of the interconnections between modules. This is completed in
two steps. First connections are completed between blocks without taking into consideration the
exact geometric details of each wire and pin. Then, a detailed routing step completes point to
point connections between pins on the blocks.
6.4 Layout Compaction: The smaller the chip size can get, the better it is. The compression of
the layout from all directions to minimize the chip area thereby reducing wire lengths, signal
delays and overall cost takes place in this design step.
6.5 Extraction and Verification: The circuit is extracted from the layout for comparison with
the original netlist, performance verification, and reliability verification and to check the
correctness of the layout is done before the final step of packaging.

Packaging: The chips are put together on a Printed Circuit Board or a Multi Chip Module

to obtain the final finished product.

Initially, design can be done with three different methodologies which provide different levels of
freedom of customization to the programmers. The design methods, in increasing order of
customization support, which also means increased amount of overhead on the part of the
programmer, are FPGAs and PLDs, Standard Cell (Semi Custom) and Full Custom Design.
While FPGAs have inbuilt libraries and a board already built with interconnections and blocks
already in place; Semi Custom design can allow the placement of blocks in user defined custom
fashion with some independence, while most libraries are still available for program
development. Full Custom Design adopts a start from scratch approach where the programmer is
required to write the whole set of libraries and also has full control over the block development,

placement and routing. This also is the same sequence from entry level designing to professional

Fig6.3: Future of VLSI

Where do we actually see VLSI Technology in action? Everywhere, in personal computers, cell
phones, digital cameras and any electronic gadget. There are certain key issues that serve as
active areas of research and are constantly improving as the field continues to mature. The
figures would easily show how Gordon Moore proved to be a visionary while the trend predicted
by his law still continues to hold with little deviations and dont show any signs of stopping in
the near future. VLSI has come a far distance from the time when the chips were truly hand
crafted. But as we near the limit of miniaturization of Silicon wafers, design issues have cropped
VLSI is dominated by the CMOS technology and much like other logic families, this too has its
limitations which have been battled and improved upon since years. Taking the example of a
processor, the process technology has rapidly shrunk from 180 nm in 1999 to 60nm in 2008 and
now it stands at 45nm and attempts being made to reduce it further (32nm) while the Die area
which had shrunk initially now is increasing owing to the added benefits of greater packing
density and a larger feature size which would mean more number of transistors on a chip.


As the number of transistors increase, the power dissipation is increasing and also the noise. If
heat generated per unit area is to be considered, the chips have already neared that of the nozzle
of a jet engine. At the same time, the Voltage scaling of threshold voltages beyond a certain point
poses serious limitations in providing low dynamic power dissipation with increased complexity.
The number of metal layers and the interconnects be it global and local also tend to get messy at
such nano levels.
Even on the fabrication front, we are soon approaching towards the optical limit of
photolithographic processes beyond which the feature size cannot be reduced due to decreased
accuracy. This opened up Extreme Ultraviolet Lithography techniques. High speed clocks used
now make it hard to reduce clock skew and hence putting timing constraints. This has opened up
a new frontier on parallel processing. And above all, we seem to be fast approaching the AtomThin Gate Oxide layer thickness where there might be only a single layer of atoms serving as the
oxide layer in the CMOS transistors. New alternatives like Gallium Arsenide technology are
becoming an active area of research owing to this.


In the semiconductor and electronic design industry, Verilog is a hardware description language
(HDL) used to model electronic systems. Verilog HDL, not to be confused with VHDL (a
competing language), is most commonly used in the design, verification, and implementation of
digital logic chips at the register-transfer level of abstraction. It is also used in the verification of
analog and mixed-signal circuits.

7.1 Overview
Hardware description languages such as Verilog differ from software programming languages
because they include ways of describing the propagation of time and signal dependencies
(sensitivity). There are two assignment operators, a blocking assignment (=), and a non-blocking
(<=) assignment. The non-blocking assignment allows designers to describe a state-machine
update without needing to declare and use temporary storage variables. Since these concepts are
part of Verilog's language semantics, designers could quickly write descriptions of large circuits


in a relatively compact and concise form. At the time of Verilog's introduction (1984), Verilog
represented a tremendous productivity improvement for circuit designers who were already using
graphical schematic capture software and specially written software programs to document and
simulate electronic circuits.
The designers of Verilog wanted a language with syntax similar to the C programming language,
which was already widely used in engineering software development. Like C, Verilog is casesensitive and has a basic pre processor (though less sophisticated than that of ANSI C/C++). Its
control flow keywords (if/else, for, while, case, etc.) are equivalent, and its operator precedence
is compatible. Syntactic differences include variable declaration (Verilog requires bit-widths on
net/reg types, demarcation of procedural blocks (begin/end instead of curly braces {}), and many
other minor differences.
A Verilog design consists of a hierarchy of modules. Modules encapsulate design hierarchy, and
communicate with other modules through a set of declared input, output, and bidirectional ports.
Internally, a module can contain any combination of the following: net/variable declarations
(wire, reg, integer, etc.), concurrent and sequential statement blocks, and instances of other
modules (sub-hierarchies). Sequential statements are placed inside a begin/end block and
executed in sequential order within the block. However, the blocks themselves are executed
concurrently, making Verilog a dataflow language.
Verilog's concept of 'wire' consists of both signal values (4-state: "1, 0, floating, undefined") and
strengths (strong, weak, etc.). This system allows abstract modeling of shared signal lines, where
multiple sources drive a common net. When a wire has multiple drivers, the wire's (readable)
value is resolved by a function of the source drivers and their strengths.
A subset of statements in the Verilog language aresynthesizable. Verilog modules that conform to
a synthesizable coding style, known as RTL (register-transfer level), can be physically realized
by synthesis software. Synthesis software algorithmically transforms the (abstract) Verilog
source into a net list, a logically equivalent description consisting only of elementary logic
primitives (AND, OR, NOT, flip-flops, etc.) that are available in a specific FPGA or VLSI


technology. Further manipulations to the netlist ultimately lead to a circuit fabrication blueprint
(such as a photo mask set for an ASIC or a bit stream file for an FPGA).

7.2 History
7.2.1 Beginning
Verilog was the first modern hardware description language to be invented. It was created by Phil
Moorby and PrabhuGoel during the winter of 1983/1984. The wording for this process was
"Automated Integrated Design Systems" (later renamed to Gateway Design Automation in 1985)
as a hardware modeling language. Gateway Design Automation was purchased by Cadence
Design Systems in 1990. Cadence now has full proprietary rights to Gateway's Verilog and the
Verilog-XL, the HDL-simulator that would become the de-facto standard (of Verilog logic
simulators) for the next decade. Originally, Verilog was intended to describe and allow
simulation; only afterwards was support for synthesis added.
7.2.2 Verilog-95
With the increasing success of VHDL at the time, Cadence decided to make the language
available for open standardization. Cadence transferred Verilog into the public domain under the
Open Verilog International (OVI) (now known as Accellera) organization. Verilog was later
submitted to IEEE and became IEEE Standard 1364-1995, commonly referred to as Verilog-95.
In the same time frame Cadence initiated the creation of Verilog-A to put standards support
behind its analog simulator Spectre. Verilog-A was never intended to be a standalone language
and is a subset of Verilog-AMS which encompassed Verilog-95.

7.2.3 Verilog 2001

Extensions to Verilog-95 were submitted back to IEEE to cover the deficiencies that users had
found in the original Verilog standard. These extensions became IEEE Standard 1364-2001
known as Verilog-2001.
Verilog-2001 is a significant upgrade from Verilog-95. First, it adds explicit support for (2's
complement) signed nets and variables. Previously, code authors had to perform signed
operations using awkward bit-level manipulations (for example, the carry-out bit of a simple 842

bit addition required an explicit description of the Boolean algebra to determine its correct
value). The same function under Verilog-2001 can be more succinctly described by one of the
built-in operators: +, -, /, *, >>>. A generate/endgenerate construct (similar to VHDL's
generate/endgenerate) allows Verilog-2001 to control instance and statement instantiation
through normal decision operators (case/if/else). Using generate/endgenerate, Verilog-2001 can
instantiate an array of instances, with control over the connectivity of the individual instances.
File I/O has been improved by several new system tasks. And finally, a few syntax additions
were introduced to improve code readability (e.g. always @*, named parameter override, C-style
function/task/module header declaration).
Verilog-2001 is the dominant flavor of Verilog supported by the majority of commercial EDA
software packages.

7.2.4 Verilog 2005

Not to be confused with System Verilog, Verilog 2005 (IEEE Standard 1364-2005) consists of
minor corrections, spec clarifications, and a few new language features (such as the uwire
A separate part of the Verilog standard, Verilog-AMS, attempts to integrate analog and mixed
signal modeling with traditional Verilog.

A hello world program looks like this:
module main;
$display("Hello world!");


A simple example of two flip-flops follows:

input clock;
input reset;
reg flop1;
reg flop2;
always @ (posedge reset or posedge clock)
if (reset)
flop1 <= 0;
flop2 <= 1;
flop1 <= flop2;
flop2 <= flop1;
The "<=" operator in Verilog is another aspect of its being a hardware description language as
opposed to a normal procedural language. This is known as a "non-blocking" assignment. Its
action doesn't register until the next clock cycle. This means that the order of the assignments is
irrelevant and will produce the same result: flop1 and flop2 will swap values every clock.
The other assignment operator, "=", is referred to as a blocking assignment. When "="
assignment is used, for the purposes of logic, the target variable is updated immediately. In the
above example, had the statements used the "=" blocking operator instead of "<=", flop1 and
flop2 would not have been swapped. Instead, as in traditional programming, the compiler would


understand to simply set flop1 equal to flop2 (and subsequently ignore the redundant logic to set
flop2 equal to flop1.)

An example counter circuit follows:

module Div20x (rst, clk, cet, cep, count, tc);
// TITLE 'Divide-by-20 Counter with enables'
// enable CEP is a clock enable only
// enable CET is a clock enable and
// enables the TC output
// a counter using the Verilog language
parameter size = 5;
parameter length = 20;
inputrst; // These inputs/outputs represent
inputclk; // connections to the module.
output [size-1:0] count;
reg [size-1:0] count; // Signals assigned
// within an always
// (or initial)block
// must be of type reg
wiretc; // Other signals are of type wire
// The always statement below is a parallel
// execution statement that

// executes any time the signals

// rst or clk transition from low to high
always @ (posedgeclk or posedgerst)
if (rst) // This causes reset of the cntr
count<= {size{1'b0}};
if (cet&&cep) // Enables both true
if (count == length-1)
count<= {size{1'b0}};
count<= count + 1'b1;
// the value of tc is continuously assigned
// the value of the expression
assigntc = (cet&& (count == length-1));
An example of delays:
reg a, b, c, d;
wire e;
always @(b or e)
a = b & e;
b = a | b;
#5 c = b;

d = #6 c ^ e;
The always clause above illustrates the other type of method of use, i.e. it executes whenever any
of the entities in the list (the b or e) changes. When one of these changes, a is immediately
assigned a new value, and due to the blocking assignment, b is assigned a new value afterward
(taking into account the new value of a). After a delay of 5 time units, c is assigned the value of b
and the value of c ^ e is tucked away in an invisible store. Then after 6 more time units, d is
assigned the value that was tucked away.
Signals that are driven from within a process (an initial or always block) must be of type reg.
Signals that are driven from outside a process must be of type wire. The keyword reg does not
necessarily imply a hardware register.
Definition of constants
The definition of constants in Verilog supports the addition of a width parameter. The basic
syntax is:
<Width in bits>'<base letter><number>

12'h123 - Hexadecimal 123 (using 12 bits)

20'd44 - Decimal 44 (using 20 bits - 0 extension is automatic)

4'b1010 - Binary 1010 (using 4 bits)

6'o77 - Octal 77 (using 6 bits)


Synthesizeable constructs
There are several statements in Verilog that have no analog in real hardware, e.g. $display.
Consequently, much of the language can not be used to describe hardware. The examples
presented here are the classic subset of the language that has a direct mapping to real gates.
// Mux examples - Three ways to do the same thing.
// The first example uses continuous assignment
wire out;
assign out =sel?a: b;
// the second example uses a procedure
// to accomplish the same thing.
reg out;
always@(a or b orsel)
1'b0: out = b;
1'b1: out = a;
// Finally - you can use if/else in a
// procedural structure.
reg out;
always@(a or b orsel)
out= a;
out= b;


The next interesting structure is a transparent latch; it will pass the input to the output when the
gate signal is set for "pass-through", and captures the input and stores it upon transition of the
gate signal to "hold". The output will remain stable regardless of the input signal while the gate
is set to "hold". In the example below the "pass-through" level of the gate would be when the
value of the if clause is true, i.e. gate = 1. This is read "if gate is true, the din is fed to latch_out
continuously." Once the if clause is false, the last value at latch_out will remain and is
independent of the value of din.
// Transparent latch example
reg out;
always@(gate or din)
out= din;// Pass through state
// Note that the else isn't required here. The variable
// out will follow the value of din while gate is high.
// When gate goes low, out will remain constant.
The flip-flop is the next significant template; in Verilog, the D-flop is the simplest, and it can be
modeled as:
reg q;
q <= d;
The significant thing to notice in the example is the use of the non-blocking assignment. A basic
rule of thumb is to use <= when there is a posedge or negedge statement within the always
A variant of the D-flop is one with an asynchronous reset; there is a convention that the reset
state will be the first if clause within the statement.
reg q;
always@(posedgeclkorposedge reset)

q <=0;
q <= d;
The next variant is including both an asynchronous reset and asynchronous set condition; again
the convention comes into play, i.e. the reset term is followed by the set term.
reg q;
always@(posedgeclkorposedge reset orposedge set)
q <=0;
q <=1;
q <= d;
Note: If this model is used to model a Set/Reset flip flop then simulation errors can result.
Consider the following test sequence of events. 1) reset goes high 2) clk goes high 3) set goes
high 4) clk goes high again 5) reset goes low followed by 6) set going low. Assume no setup and
hold violations.
In this example the always @ statement would first execute when the rising edge of reset occurs
which would place q to a value of 0. The next time the always block executes would be the rising
edge of clk which again would keep q at a value of 0. The always block then executes when set
goes high which because reset is high forces q to remain at 0. This condition may or may not be
correct depending on the actual flip flop. However, this is not the main problem with this model.
Notice that when reset goes low, that set is still high. In a real flip flop this will cause the output
to go to a 1. However, in this model it will not occur because the always block is triggered by
rising edges of set and reset - not levels. A different approach may be necessary for set/reset flip


The final basic variant is one that implements a D-flop with a mux feeding its input. The mux has
a d-input and feedback from the flop itself. This allows a gated load function.
// Basic structure with an EXPLICIT feedback path
q <= d;
q <= q;// explicit feedback path
// The more common structure ASSUMES the feedback is present
// This is a safe assumption since this is how the
// hardware compiler will interpret it. This structure
// looks much like a latch. The differences are the
// '''@(posedgeclk)''' and the non-blocking '''<='''
q <= d;// the "else" mux is "implied"
Note that there are no "initial" blocks mentioned in this description. There is a split between
FPGA and ASIC synthesis tools on this structure. FPGA tools allow initial blocks where reg
values are established instead of using a "reset" signal. ASIC synthesis tools don't support such a
statement. The reason is that an FPGA's initial state is something that is downloaded into the
memory tables of the FPGA. An ASIC is an actual hardware implementation.
Initial and always
There are two separate ways of declaring a Verilog process. These are the always and the initial
keywords. The always keyword indicates a free-running process. The initial keyword indicates a
process executes exactly once. Both constructs begin execution at simulator time 0, and both
execute until the end of the block. Once an always block has reached its end, it is rescheduled
(again). It is a common misconception to believe that an initial block will execute before an


always block. In fact, it is better to think of the initial-block as a special-case of the alwaysblock, one which terminates after it completes for the first time.
a =1;// Assign a value to reg a at time 0
#1;// Wait 1 time unit
b = a;// Assign the value of reg a to reg b
always@(a or b)// Any time a or b CHANGE, run the process
c = b;
d =~b;
end// Done with this block, now return to the top (i.e. the @ event-control)
always@(posedge a)// Run whenever reg a has a low to high change
a <= b;
These are the classic uses for these two keywords, but there are two significant additional uses.
The most common of these is an always keyword without the @(...) sensitivity list. It is possible
to use always as shown below:
begin// Always begins executing at time 0 and NEVER stops
clk=0;// Set clk to 0
#1;// Wait for 1 time unit
clk=1;// Set clk to 1
#1;// Wait 1 time unit
end// Keeps executing - so continue back at the top of the begin

The always keyword acts similar to the "C" construct while(1) {..} in the sense that it will
execute forever.
The other interesting exception is the use of the initial keyword with the addition of the forever
The example below is functionally identical to the always example above.
initialforever// Start at time 0 and repeat the begin/end forever
clk=0;// Set clk to 0
#1;// Wait for 1 time unit
clk=1;// Set clk to 1
#1;// Wait 1 time unit
The fork/join pair are used by Verilog to create parallel processes. All statements (or blocks)
between a fork/join pair begin execution simultaneously upon execution flow hitting the fork.
Execution continues after the join upon completion of the longest running statement or block
between the fork and join.
$write("A");// Print Char A
$write("B");// Print Char B
#1;// Wait 1 time unit
$write("C");// Print Char C
The way the above is written, it is possible to have either the sequences "ABC" or "BAC" print
out. The order of simulation between the first $write and the second $write depends on the

simulator implementation, and may purposefully be randomized by the simulator. This allows the
simulation to contain both accidental race conditions as well as intentional non-deterministic
Notice that VHDL cannot dynamically spawn multiple processes like Verilog
Race conditions
The order of execution isn't always guaranteed within Verilog. This can best be illustrated by a
classic example. Consider the code snippet below:
a =0;
b = a;
$display("Value a=%b Value of b=%b",a,b);
What will be printed out for the values of a and b? Depending on the order of execution of the
initial blocks, it could be zero and zero, or alternately zero and some other arbitrary uninitialized
value. The $display statement will always execute after both assignment blocks have completed,
due to the #1 delay.
Note: These operators are not shown in order of precedence.
Operator type Operator symbols Operation performed
Bitwise NOT (1's complement)
Bitwise AND
Bitwise OR





~^ or ^~
~^ or ^~

Concatenation { , }
Replication {n{m}}
Conditional ? :

Bitwise XOR
Bitwise XNOR
Reduction AND
Reduction NAND
Reduction OR
Reduction NOR
Reduction XOR
Reduction XNOR
2's complement
Exponentiation (*Verilog-2001)
Greater than
Less than
Greater than or equal to
Less than or equal to
Logical equality (bit-value 1'bX is removed from comparison)
Logical inequality (bit-value 1'bX is removed from
4-state logical equality (bit-value 1'bX is taken as literal)
4-state logical inequality (bit-value 1'bX is taken as literal)
Logical right shift
Logical left shift
Arithmetic right shift (*Verilog-2001)
Arithmetic left shift (*Verilog-2001)
Replicate value m for n times

Four-valued logic
The IEEE 1364 standard defines a four-valued logic with four states: 0, 1, Z (high impedance),
and X (unknown logic value). For the competing VHDL, a dedicated standard for multi-valued
logic exists as IEEE 1164 with nine levels.


8.1 Synthesis Result
To investigate the advantages of using our technique in terms of area overhead against
Fully ECCand against the partially protection, we implemented andsynthesized for a Xilinx
XC3S500E different versions of a32-bit, 32-entry, dual read ports, single write port registerfile.
Once the functional verification is done, the RTL model is taken to the synthesis process using

the Xilinx ISE tool. In synthesis process, the RTL model will be converted to the gate level
netlist mapped to a specific technology library. Here in this Spartan 3E family, many different
devices were available in the Xilinx ISE tool. In order to synthesis this design the device named
as XC3S500E has been chosen and the package as FG320 with the device speed such as 4.

8.2 RTL Schematic

The RTL (Register Transfer Logic) can be viewed as black box after synthesize of design
is made. It shows the inputs and outputs of the system. By double-clicking on the diagram we
can see gates, flip-flops and MUX.

Figure 8.1: RTL schematic of Top-level 3 to 8 Reversible Decoder


Figure 8.2: RTL schematic of Internal block 3 to 8 Reversible Decoder

Figure 8.3: Technology schematic of Top-level 3 to 8 Reversible Decoder


Figure 8.4: Technology schematic of Internal block 3 to 8 Reversible Decoder

Figure 8.5: Internal block 3 to 8 Reversible Decoder


8.3 Synthesis Report

This device utilization includes the following.

Logic Utilization

Logic Distribution

Total Gate count for the Design

The device utilization summery is shown above in which its gives the details of number

of devices used from the available devices and also represented in %. Hence as the result of the
synthesis process, the device utilization in the used device and package is shown below.


Table 8.1: Synthesis report of 3 to 8 Reversible Decoder


The corresponding simulation results are shown below.

Figure 9.1: Test Bench for 3 to 8 Reversible Decoder


Figure 9.2: Simulated output for 3 to 8 Reversible Decoder

Figure 9.3: Schematic for Fredkinn reversible gate


Figure 9.4: Schematic for Feymann double gate


Figure 9.5: Schematic for 3 to 8 Reversible Decoder


Figure 9.6: timing diagram for 3 to 8 reversible decoder


we presented the design methodologies of an n-to-2n reversible fault tolerant decoder, where n is
the number of data bits. We proposed several lower bounds on the numbers of garbage outputs,
constant inputs and quantum cost and proved that the proposed circuit has constructed with the
optimum garbage outputs, constant inputs and quantum cost. In addition, we presented the
designs of the individual gates of the decoder using MOS transistors in order to implement the
circuit of the decoder with transistors. Simulations of the transistor implementation of the
decoder showed that the proposed fault tolerant decoder works correctly. The comparative results
proved that the proposed designs perform better than its counterpart. We also proved the
efciency and supremacy of the proposed scheme with several theoretical explanations.
Proposed reversible fault tolerant decoders can be used in parallel circuits, multiple-symbol
differential detection , network components and in digital signal processing etc.


[1] L. Jamal, M. Shamsujjoha, and H. M. HasanBabu, Design of optimal reversible carry lookahead adder with optimal garbage and quantum cost, International Journal of Engineering and
Technology, vol. 2, pp. 4450, 2012.
[2] C. H. Bennett, Logical reversibility of computation, IBM J. Res. Dev., vol. 17, no. 6, pp.
525532, Nov. 1973. [Online]. Available:
[3] M. Nielsen and I. Chuang, Quantum computation and quantum infor- mation. New York,
NY, USA: Cambridge University Press, 2000.
[4] M. P. Frank, The physical limits of computing, Computing in Scienceand Engg., vol. 4, no.
3, pp. 1626, May 2002. [Online]. Available:
[5] A. K. Biswas, M. M. Hasan, A. R. Chowdhury, and H. M. HasanBabu,Efficient approaches
for designing reversible binary coded decimaladders, Microelectron. J., vol. 39, no. 12, pp.
16931703, Dec. 2008.[Online]. Available:
[6] M. Perkowski, Reversible computation for beginners, 2000, lectureseries, 2000, Portland
state university.[Online]. Available:
[7] S. N. Mahammad and K. Veezhinathan, Constructing online testablecircuits using reversible
logic, IEEE Transactions on Instrumentationand Measurement, vol. 59, pp. 101109, 2010.
[8] W. N. N. Hung, X. Song, G. Yang, J. Yang, and M. A. Perkowski,Optimal synthesis of
multiple output boolean functions using a set ofquantum gates by symbolic reachability
analysis, IEEE Trans. on CADof Integrated Circuits and Systems, vol. 25, no. 9, pp. 16521663,
[9] D. Maslov, G. W. Dueck, and N. Scott, Reversible logic synthesisbenchmarks page, 2005.
[Online]. Available:

[10] D. M. Miller, D. Maslov, and G. W. Dueck, A transformationbased algorithm for reversible

logic synthesis, in Proceedings of the40th annual Design Automation Conference, ser. DAC









1to 2 reversible decoder
module DECODER1TO2(s,o);
input s;
output [1:0]o;
F2G u1(.a(s),.b(1),.c(0),.p(),.q(o[0]),.r(o[1]));

2to 4 reversible decoder

module DECODER2TO4(s, o);
input [1:0] s;
output [3:0] o;
wire [2:0]w;
DECODER1TO2 u1(.s(s[0]),.o(w[1:0]));
RFRG u2(.a(s[1]),.b(0),.c(w[0]),.p(w[2]),.q(o[2]),.r(o[0]));
RFRG u3(.a(w[2]),.b(0),.c(w[1]),.p(),.q(o[3]),.r(o[1]));



3to8 reversible decoder

module DECODER3TO8(s, o);
input [2:0] s;
output [7:0] o;
wire [3:0]w;
wire [2:0]w1;
DECODER2TO4 u1(.s(s[1:0]),.o(w));
RFRG u2(.a(s[2]),.b(0),.c(w[3]),.p(w1[0]),.q(o[7]),.r(o[3]));
RFRG u3(.a(w1[0]),.b(0),.c(w[1]),.p(w1[1]),.q(o[5]),.r(o[1]));
RFRG u4(.a(w1[1]),.b(0),.c(w[2]),.p(w1[2]),.q(o[6]),.r(o[2]));
RFRG u5(.a(w1[2]),.b(0),.c(w[0]),.p(),.q(o[4]),.r(o[0]));