You are on page 1of 4

A Novel Design of a Memristor-Based Look-Up

Table (LUT) For FPGA


T. Nandha Kumar and Haider A.F.Almurib Fabrizio Lombardi Fellow, IEEE
Department of Electrical and Electronic Eng., Department of ECE,
The University of Nottingham Northeastern University,
Selangor, Malaysia. Boston, MA 02115, USA
nandhakumaar.t; haider.abbas@nottingham.edu.my lombardi@ece.neu.edu

Abstract—This paper presents a novel scheme for a by using a complex circuit-level solution with double
memristor-based look-up table (LUT); in this scheme the states sampling [9]. However these schemes only partially alleviate
of the unselected memristors are unaffected by WRITE/READ the sneak path current problem by imposing limitations on the
operations. Therefore, it addresses the prevalent problems array size.
associated with nano crossbars, such as the write half-select and
To address these issues, this paper proposes a novel scheme
sneak path currents. In the proposed scheme the memristors are
connected in rows and columns, while the columns are isolated. in which the memristors are connected in rows and columns,
The new scheme is simulated using LTSPICE IV and extensive but the columns are isolated. It is then possible to prevent the
results are presented with respect to the WRITE and READ sneak path current and write half-select problems, because the
operations. In addition, the performance improvement of the memristances of the unselected memristors are unaffected.
proposed method is compared with previous LUT schemes using Also the proposed scheme retains the advantages of [10], such
memristors as well as SRAM. The results show that proposed as no power dissipation in stand-by mode; moreover, no
scheme is significantly better in terms of delay and Energy Delay refresh pulse and no V/2 bias are required (so also lowering
Product (EDP) for both the WRITE and READ operations. the number of power rails). Extensive simulation results and a
Keywords— Memristor, non-volatile memory, FPGA, LUT
detailed comparative analysis (inclusive of circuit modeling)
with previous works [10] [12] and SRAM based LUT show
I. INTRODUCTION that the proposed scheme is significantly better in terms of
delay and EDP for both the WRITE and READ operations.

S tatic Random Access Memory (SRAM) based Field


Programmable Gate Arrays (FPGA) have been widely
utilized in digital system design as they allow the fast
Hereafter, let w(t) denote the length of the doped region and
D the total length of the titanium dioxide layer memristor.
Then w(t)/D is referred to as the Normalized State Parameter
hardware realization of digital system designs at relatively
low development cost and good performance [1]. However, (NSP) [10]. When w(t) = D, then NSP=1 and the memristor is
SRAM is unable to retain the configurations bits when the at the least resistance value (RON). If w(t) = 0 then, NSP=0 and
power is lost; hence, Non-Volatile (NV) flash memories have the memristor is at the highest resistance value (ROFF) where
been used as external storage to the FPGA for the no doped region is present. In this work, updated version of
configuration bits [2]. Nevertheless, CMOS-based flash the SPICE model [3] is utilized for simulating the memristor.
memories incur in a large cost, high area and low data This paper is organized as follows. Section II deals with the
retrieving time. Hence to overcome the above mentioned proposed LUT design. Section III presents the LTPICE
issues, NV memory blocks (as Look-Up Table (LUTs)) made simulation results and Section IV concludes this manuscript.
of so-called resistive elements (such as the memristor [4])
have been proposed as storage elements. The memristor II. PROPOSED LUT
appears to be a possible potential candidate for replacing The proposed LUT design uses memristors but without
SRAM and NVFM [9] to store the configuration bits of the employing a nano crossbar; as shown later in this paper, it has
FPGA. In recent years, memristor based memories using nano a fast WRITE time, a significantly reduced READ power
crossbars have been extensively analyzed in the technical dissipation (compared with previous schemes such as [10]-
literature [5]-[12]. Though these memories have been [12]) and no power dissipation in the stand-by mode. Different
advocated as a potential replacement for conventional NV from other schemes [10]-[12], the proposed scheme eliminates
Flash Memories (NVFM) as LUTs in a FPGA due to the the effects of write half-select and sneak path current, while
higher density and lower power consumption [5] but these preserving significant performance features.
memories usually use nano crossbars [10]-[12], whose The proposed memory block for a two-input LUT is shown
operation is affected by sneak path currents and the write half- in Fig. 1; it consists of columns of nano wires that constitute
select problems [9]. Hence following few WRITE/ READ the Bit Lines (BLs). A terminal of every memristor is
operations, the memristance of the unselected memristors connected to a BL, while the other terminals are connected to
changes, thus resulting in an erroneous stored data. The sneak the controller, thus forming independent horizontal lines that
path current problem has been addressed by biasing all are still referred to as Word Lines (WLs). Thus different from
unselected rows to the same voltage of the selected columns or other scheme, the memristors on a row are not connected, i.e.

c
978-1-4799-5230-4/14/$31.00 2014 IEEE 703
the proposed scheme does not utilize a nano crossbar require a V/2 bias to unselect a memristor and a two-step
structure. The number of memristors connected to each BL writing scheme. The WRITE is performed across the
determines the dimension of the LUT. Every BL is connected memristors connected to a BL, so the WRITE delay is
to ground through a MOS transistor; so according to the input significantly reduced.
data, the controller (Fig. 2) handles the data to be driven on нsĚĚ ͲsĚĚ

WL. Also, it switches on and off the transistors by controlling


t>ϭϭ
the gate signals (G1 and G2) and selects (sel) the appropriate нsĚĚ
Ϭ d'ϭ
BL value to the output (Out). A and B are the inputs of the
LUT; Out is the output of the LUT. The logic values of the ͲsĚĚ t>ϭϮ

four different address lines AB (00, 01, 10 and 11) are stored d'Ϯ

in the memristors M11, M12, M21 and M22 respectively. The t>ϮϮ
нsĚĚ
voltage requirements for the signals of the WRITE and READ ϭ d'ϰ
operations of a selected memristor (in this case M11, without
ͲsĚĚ t>Ϯϭ
loss of generality and/or correctness) are shown in Table 1. ϭ
 d'ϯ

ϯ
tŶ
ZŶ Ϯ

 dϭ

ZĞƐĞƚ dϮ

^Ğů

Fig. 2 Circuit diagram of the controller for a two-input LUT.

Fig. 1 Proposed new scheme for a two-input LUT memristors memory block. Table 2
WRITE and READ operations using proposed scheme
Table 1 M11(1) M12(2) M21(3) M22(4)
Voltage requirements for the WRITE operation of M11& M21 and READ Operation REn WEn C A B T1 T2
(00) (01) (10) (11)
operations on M11 using proposed LUT.
↓ ↑ 0 × × ↑ ↓ D0 z D1 z
WL11 WL12 WL21 WL22 BL1 Voltage BL2 Voltage Write
↓ ↑ 1 × × ↓ ↑ z D0 z D1
Write 1 Vdd Floating Vdd Floating GND Floating
Write 0 -Vdd Floating -Vdd Floating GND Floating ↑ ↓ × 0 0 ↑ ↓ D0 z z z
Read ±Vdd Floating Floating Floating GND Floating ↑ ↓ × 0 1 ↓ ↑ z D0 z z
Read
↑ ↓ × 1 0 ↑ ↓ z z D1 z
↑ ↓ × 1 1 ↓ ↑ z z z D1
The inputs WEn and REn trigger the controller to choose
either the WRITE or READ operation to be performed. The Note: ↓ - Low; ↑- High; ×- don’t care; z- floating (high resistance)
input C is used to choose a particular BL, such that the B. READ operation
memristors connected to it execute the WRITE operation. In When REN is high, the READ operation is performed on
the proposed scheme, unlike previous schemes, the WRITE the memristor that corresponds to the values of A and B. For
operation is performed across all memristors connected to a example (Table 2), if AB is “00” then the READ operation is
single BL at once. The READ operation is similar to [10] in performed on M11 by tuning on T1, while the remaining
which depending on the value of AB the corresponding memristors on BL1 and BL2 are connected to a high
memristor is READ and the result appears at OUT. The circuit resistance (floating) because the corresponding pass
diagram for the proposed controller is shown in Fig. 2 while transistors(TG2, TG3 & TG4) are in the off state. By applying
Table 2 shows the truth table of the controller function. the READ voltage (Table 1) on WL1 and propagating the
A. WRITE operation voltage across T1 to OUT by the appropriate sel(0) signal, the
The WRITE operation is performed across all memristors value stored for the input “00” is read out, thus, completing a
connected to a particular BL when WEN is high. As shown in READ operation. Furthermore, as explained next, the output
Table 2 and Fig. 2, if C is 0 (1) then the memristors connected voltage difference between the READ 0 (NSP =0) and READ
to BL1 (BL2) i.e. M11 and M21(M12 and M22) are involved 1 (NSP =1) using the proposed scheme is significantly greater
in the WRITE operation by driving T1(T2) to the on state, so than for [10].
depending on the WRITE data (Table 1) the corresponding ŝt>ϭ ŝZϭϭ ŝZϭϮ ŝt>ϭϭ ŝZϭϭ
value is written in the memristors. As shown in Fig.1, BL is Zϭϭ ZϭϮ Zϭϭ ZϭϮ
connected to ground through the MOS transistors. Hence
when performing the WRITE operation on the memristors st>ϭsϭ sϮ st>ϭϭsϭ Zd'Ϯ sϮ

connected to BL1, the memristors connected to BL2 are ZdϭͲKE ZϮϭ ZdϮͲK&& ZϮϮ ZdϭͲKE ZϮϭнZd'ϯ ZdϮͲK&& ZϮϮнZd'ϰ

unaffected because those memristors are totally independent


of BL1 and T2 is turned off. Therefore unlike previous (a) Previous architecture (b) Proposed new architecture
schemes, the proposed scheme does not suffer from the write
half-select problem; moreover, the proposed scheme does not Fig. 3 Reading example equivalent circuit; (a) [10], (b) proposed scheme.

704
The equivalent circuit diagrams for the READ operation C. Sneak Current path
on M11 in a two-inputs LUT using [10] and the proposed As shown in Fig.3(b), iWL1 doesn’t branch to the unselected
scheme are shown in Fig.3(a) and Fig.3(b) respectively. memristors (R12 & R22) that are connected to the BL connected
While for [10] the unselected WLs are grounded, in the to the selected memristor; therefore the NSPs of the unselected
proposed scheme, the unselected WLs are connected to a high memristors (R12 & R22) are not affected during the READ
resistance (in the order of Giga Ohms) as the corresponding operation on R11. This is applicable to any LUT size.
pass transistors are turned off. Under the scheme of [10], the Consider next the unselected memristor (R21) that is
output voltages V1 and V2 of Fig. 3(a) are given by connected to the same BL to which the selected memristor
(R11) is connected. R21 is unselected by turning off the pass
ଵ ൌ ୛୐ଵ  ୔ଵ Τሺ ୔ଵ ൅ ଵଵ ሻ (1) transistors (TG3) that provides a very high resistance to its
ଶ ൌ ୛୐ଵ  ୔ଶ Τሺ ୔ଵ ൅ ଵଶ ሻ (2) path. Therefore iWL1 is infinitesimally small and hence the
NSP of the unselected memistor is unaffected; therefore, the
where 1/RP1=1/RT1-ON+1/R21 and 1/RP2=1/RT2-OFF+1/R22 are the proposed scheme does not suffer from the sneak path current
parallel resistances between the unselected memristors and the problem. As the memristors in the rows are not connected, V/2
transistors. The READ current iWL1 branches to the unselected biasing is not required and therefore the proposed scheme does
WLs; this causes the magnitude of the output voltage V1 not incur the write half select problem. It should be noted that
across the load (RT1-ON) to decrease, so also reducing the in the proposed controller, the simultaneous assertion of WEN
output voltage difference between the READ 0 and and REN is not possible.
1operations. Moreover at a large LUT dimension, the number
of unselected memristors connected in parallel to the output III. SIMULATION RESULTS
increases, whereby decreasing the effective load resistance The proposed LUT with dimension of 2, 4, 6 and 8 has
and V1. The worst case occurs when the NSP of the unselected been designed and simulated using LTSPICE. Different
memristor is 1 i.e. at the smallest resistance (RON). The output scenarios for the WRITE and READ operations were
voltage (V =V1(1) –V1(0)) difference between the READ 1 considered. The results are compared with the previous works
and 0 operations for different values of RON and LUT sizes of [10][11][12].
using [10] and the proposed scheme are shown in Fig. 4. As
expected, with an increase in memory size, the difference in A. WRITE Operation
output voltage decreases due to the decrease in the load The write delay, energy dissipation and the Energy Delay
resistance; also as RON increases (under a constant ROFF/ RON Product (EDP) of the LUT for the proposed and previous
ratio), the output voltage difference decreases (as caused by schemes [10][11][12] were determined for the four scenario
the decrease in the amount of current flowing in the circuit). presented in [10]. These results confirm that for the proposed
0.8 scheme the NSPs of the unselected memristors and the two
Difference in output voltage [V]

Previous architecture
R ON = 100Ω R OFF = 19kΩ phase WRITE operation of [10] is not required. The average
0.6 R ON = 1kΩ ROFF = 190kΩ
R ON = 10kΩ R OFF = 1.9MΩ
case EDP values are presented in Fig 5.
0.4
The average case WRITE delays for the proposed scheme
Proposed new architecture
R ON = 100Ω R OFF = 19kΩ are significantly less than for [10][11][12]; as the dimension of
R ON = 1kΩ ROFF = 190kΩ
0.2
R ON = 10kΩ R OFF = 1.9MΩ
the LUT increases the difference between the WRITE delay
for the proposed and these schemes increases significantly.
0
2x2 3x3 4x4 5x5 6x6
LUT memory size
7x7 8x8 This occurs because, in the proposed scheme, the WRITE
operation is performed across the memristors connected to a
Fig. 4 Difference between READ 0 and 1 operation at different RON and LUT
memory of sizes under the worst case for [10] and proposed scheme. BL. Also, the average and worst case EDPs of the proposed
scheme is significantly less than in [10][11][12].
In the proposed scheme, RTG3 has a high resistance (in the Proposed scheme Scheme of [10] Scheme of [11] & [12]
6
x 10
order of few giga ohms), so the effective resistance across RT1- 3000
800
4
ON remains constant. Also in the proposed scheme, the READ
EDP [ns.pJ]
Energy [pJ]

600
Delay [ns]

2000 3
current iWL1 does not branch and hence, the voltage across the 400 2
load is given by 1000
1
200

0
ଵ ൌ ୛୐ଵ  ୘ଵି୓୒ Τሺ ୘ଵି୓୒ ൅ ଵଵ ሻ (3) 0
2x2 4x4 6x6 8x8
0
2x2 4x4 6x6
LUT size
8x8 2x2 4x4 6x6 8x8

(a) (b) (c)

where V2 = 0. Therefore using the proposed scheme the Fig. 5 Average WRITE vs LUT size; (a) Delay, (b) Energy, and (c) EDP.
magnitude of V1 increases by nearly 150% more than [10] for Next, the simulation is performed to compare the
a better output voltage difference between the READ 0 and performance of the WRITE operation of a volatile SRAM-
1operations; also an increase in LUT dimension does not based LUT designed using 32nm feature size with the
affect the load resistance and hence the output voltage proposed non-volatile design. The simulation results are
difference remains constant (Fig. 4). shown in Table 3. The average delay and EDP at different
sizes of the SRAM based LUTs are significantly less than the
proposed method.

705
Table 3 amenable to FPGA implementation and unlike previous works
Comparison of average WRITE delay, energy and EDP
[10][11][12] it does not utilize the the nano crossbar as
Aver. Delay (ns) Aver. Energy (pJ) Aver. EDP (pJns)
LUT size
Proposed SRAM Proposed SRAM Proposed SRAM memory scheme. The propose scheme uses an independent
2x2 171.88 0.1031 26.78 0.0016 4912.2 0.000173 selection circuit that does not incur in sneak path current
4x4 214.1 0.2062 91.08 0.0067 24437.94 0.001386 generation, thus avoiding changing the state of unselected
6x6 256.03 0.3093 198.23 0.0151 70132.36 0.004679 memristors during the memory operation. One of the
8x8 297.8 0.4124 348.32 0.0268 152715.5 0.011091 advantages of the proposed scheme is that it permits the
B. READ Operation simultenous WRITE operation to all memristors connected to
a BL; therefore the WRITE time decreases considerably.
Consider the READ operation of a single cell. The input is
However, its significant advantage is that its READ delay is
applied to the selected cell; so, its BL transistor is activated,
significanlty less than previous schemes. In terms of hardware
while the remaining BL transistors are switched off. As an
unlike [11][12], there is no write half select problem and
example in the two-input LUT case of Fig. 1 to read the
hence it requires a smaller less number of power rails for its
contents of M11, WL11 is connected to +Vdd, while the
operation (similar to [10]). When comparing with SRAM
remaining inputs (WL12, WL21 and WL22) are floating, T1 is
based LUT, though the WRITE operation of the proposed
turned on, while T2 is turned off, and the Select of the MUX is
method requires larger dealy but during the READ operation,
logic 0 to connect to the output Out1. Also in this case, the
the proposed method outperformes the SRAM based LUT in
performance assessment of the proposed scheme utilizes four
terms of dealy and EDP. Therefore, the proposed design is
scenario of [10]; the results are then compared to those of
suitable for FPGAs because more READ operations are
[10][11][12]. The worst case results for different READ
normally performed than WRITE operations.
operations across different array size are shown in Fig.6 for
the proposed scheme and [10][11][12].
REFERENCES
In the proposed scheme, the average and worst case READ
delays remain constant, so nearly independent of LUT [1] T. Nandha Kumar, H. A.F. Almurib and F. Lombardi, “Single-
dimension; as explained in a previous section, this is caused Configuration Fault detection in Application-Dependent Testing of
FPGA Interconnects”, in IET Transactions on Computers & Digital
by the constant value of the load resistance. In addition, when
Techniques, vol. 7, No. 3, pp.132-141, 2013.
compared with [10][11][12], the average and average case [2] “Xilinx SpartanTM-3AN FPGAs”, http://www.xilinx.com
READ delays of the proposed scheme are significantly [3] Z. Biolek, D. Biolek and V. Biolova,“SPICE Model of Memristor with
decreased, so capable of delivering a significantly faster Nonlinear Dopant Drift”,Radioengineering, vol.18, no.2,pp.210-14,
READ operation as very important feature in a FPGA. Also, 2009.
the average and worst case EDPs of the proposed scheme are [4] J. J. Yang, M. D. Pickett, X. Li, D. A. A. Ohlberg, D. R. Stewart and R.
significantly less than in [10][11][12]. S. Williams, “Memristive switching mechanism for metal/oxide/metal
nanodevices,” Nature Nanotechnology,vol 3, pp 429–433, 2008.
Proposed scheme Scheme of [10] Scheme of [11] & [12] [5] J. Cong and B. Xiao, “mrFPGA: A Novel FPGA Scheme with
800
10 4000 Memristor-Based Reconfiguration”, in Proc. IEEE/ACM International
8 3000
Symposium on Nanoscale Architectures 2011, pp. 1-8.
600
Energy [fJ]

EDP [fs.fJ]

[6] S. Tanachutiwat, M. Liu, and W. Wang, “FPGA Based on Integration of


Delay [fs]

6
2000
400 CMOS and RRAM”, in IEEE Transactions on VLSI Systems, vol. 19,
4
200 1000 no. 11, pp. 2023-2032, Nov. 2011.
2
0
[7] O.Turkyilmaz, S.Onkaraiah, M.Reyboz, F.Clermidy, Hraziia, C. Anghel,
0
2x2 4x4 6x6 8x8 2x2 4x4 6x6
LUT size
8x8 2x2 4x4 6x6 8x8 J.M. Portal, M.Bocquet, “RRAM-based FPGA for “Normally Off,
(a) (b) (c) Instantly On” Applications”, in Proc. IEEE/ACM International
Fig. 6 Average READ vs LUT size; (a) Delay, (b) Energy, and (c) EDP. Symposium on Nanoscale Architectures, 2012, pp101-108.
[8] I. E. Ebong and P. Mazumder, “Self-Controlled Writing and Erasing in a
The simulation results of the READ operation for different Memristor Crossbar Memory”, in IEEE Transactions on
sizes of SRAM-based LUTs at 32nm are compared with the Nanotechnology, vol. 10, no. 6, Nov. 2011.
proposed method. As shown in Table 4, the proposed method [9] A. Chen, “Accessibility of Nano-Crossbar arrays of resistive switching
requires significantly smaller READ delay and EDP compared devices”, in Proc. IEEE International Conference on Nanotechnology,
with a SRAM-based LUT. Therefore, the proposed design is 2011, pp. 1767-1771.
suitable for FPGAs because more READ operations are [10] T.N.Kumar, H.A.F. Almurib and F. Lombardi,”On the Operational
normally performed than WRITE operations. Features and Performance of a Memristor-Based Cell for a LUT of an
Table 4 FPGA” in Proc. of 13th IEEE International Conference on
Comparison of average READ delay, energy and EDP Nanotechnology, 2013, pp. 71-76.
LUT Aver. Delay (fs) Aver. Energy (fJ) Aver. EDP (fJfs) [11] Y. Ho, Garng M. Huang, and P. Li, “Dynamical Properties and Design
size Proposed SRAM Proposed SRAM Proposed SRAM Analysis for Nonvolatile Memristor Memories”, in IEEE Transactions
2x2 35.310 320255 2.405 0.466 84.92 149313 on circuits and systems-I, vol. 58, no. 4, April 2011.
4x4 35.310 1094964 4.744 1.718 167.52 1881392
[12] C. Xu, X. Dong, N. P. Jouppi, and Y. Xie “Design Implications of
6x6 35.310 3364084 7.083 9.3831 250.12 3154677
Memristor-Based RRAM Cross-Point Structures”, in Proc of Design,
8x8 35.310 14922750 9.422 692.839 332.71 0339063187
Automation and Test in Europe, 2011, pp. 1–6.
IV. CONCLUSION
This paper has proposed a new LUT scheme that utilizes
memristors as non-volatile storage elements. This LUT is

706

You might also like