DC Convolution Codes

A property of MVG_OMALLOOR
Introduction to Binary Convolutional Codes [1]

Yunghsiang S. Han
PDF processed with CutePDF evaluation edition www.CutePDF.com

Graduate Institute of Communication Engineering,
National Taipei University
Taiwan
E-mail: yshan@mail.ntpu.edu.tw
A Y. S. of
property Han
MVG_OMALLOOR Introduction to Binary Convolutional Codes 1
Binary Convolutional Codes

1. A binary convolutional code is denoted by a three-tuple (n, k, m).
2. n output bits are generated whenever k input bits are received.
3. The current n outputs are linear combinations of the present k
input bits and the previous m × k input bits.
4. m designates the number of previous k-bit input blocks that
must be memorized in the encoder.
5. m is called the memory order of the convolutional code.
Graduate Institute of Communication Engineering, National Taipei University

A Y. S. of
property Han
Encoders for the Convolutional Codes

1. A binary convolutional encoder is conveniently structured as a
mechanism of shift registers and modulo-2 adders, where the
output bits are modular-2 additions of selective shift register
contents and present input bits.
2. n in the three-tuple notation is exactly the number of output
sequences in the encoder.
3. k is the number of input sequences (and hence, the encoder
consists of k shift registers).
4. m is the maximum length of the k shift registers (i.e., if the
number of stages of the jth shift register is Kj , then
m = max1≤j≤k Kj ).
Pk
5. K = j=1 Kj is the total memory in the encoder (K is
sometimes called the overall constraint lengths).

A Y. S. of
property Han
6. The definition of constraint length of a convolutional code is

defined in several ways. The most popular one is m + 1.

A Y. S. of
property Han
Encoder for the Binary (2, 1, 2) Convolutional Code

u is the information sequence and v is the corresponding code
sequence (codeword).
-⊕P
iP bv 1 = (1010011)
6 PPP J
]
P J
u = (11101) r - r - -r J b v = (11 01 10 01 00 10 11)
³³ ^
³ ³³
-⊕)
³ bv 2 = (1101001)

A Y. S. of
property Han
Encoder for the Binary (3, 2, 2) Convolutional Code
c v 1 = (1000)
c v 2 = (1100)
u1 = (10) s ¡- s s
¢
¢ Z
¢
u2 = (11) s¢ - s - s Z
~⊕ c v = (0001)
Z
- 3
µ
¡
¡
¡
u = (11 01) v = (110 010 000 001)

A Y. S. of
property Han
Impose Response and Convolution
.. - v1
u1 .. - .
. - vj
ui -
.. Convolutional encoder ..
. .
uk -
- vn
1. The encoders of convolutional codes can be represented by linear

time-invariant (LTI) systems.

A Y. S. of
property Han
2.
k
(1) (2) (k) (k)
X
v j = u1 ∗ g j + u2 ∗ g j + · · · + uk ∗ g j = ui ∗ g j ,
i=1
(i)
where ∗ is the convolutional operation and gj is the impulse
response of the ith input sequence with the response to the jth
output.
(i)
3. g j can be found by stimulating the encoder with the discrete
impulse (1, 0, 0, . . .) at the ith input and by observing the jth
output when all other inputs are fed the zero sequence
(0, 0, 0, . . .).
4. The impulse responses are called generator sequences of the
encoder.

A Y. S. of
property Han
Impulse Response for the Binary (2, 1, 2) Convolutional

Code
-⊕P
iP bv 1 = (1010011)
6 PPP J
]
P J
u = (11101) r - r - -r J b v = (11 01 10 01 00 10 11)
³ ³³ ^
³
³³
-⊕) bv 2 = (1101001)
g1 = (1, 1, 1, 0, . . .) = (1, 1, 1), g 2 = (1, 0, 1, 0, . . .) = (1, 0, 1)

v1 = (1, 1, 1, 0, 1) ∗ (1, 1, 1) = (1, 0, 1, 0, 0, 1, 1)
v2 = (1, 1, 1, 0, 1) ∗ (1, 0, 1) = (1, 1, 0, 1, 0, 0, 1)

A Y. S. of
property Han
Impulse Response for the (3, 2, 2) Convolutional Code
c v 1 = (1000)
c v 2 = (1100)
u1 = (10) s ¡- s s
¢
¢ Z
¢
u2 = (11) s¢ - s - s Z
~⊕ c v = (0001)
Z
- 3
µ
¡
¡
¡
u = (11 01) v = (110 010 000 001)
(1) (2) (1) (2) (1)

g1 = 4 (octal)= (1, 0, 0), g 1 = 0 (octal), g 2 = 0 (octal), g 2 = 4 (octal), g 3 = 2 (octal),
(2)
g3 = 3 (octal)

A Y. S. of
property Han
Generator Matrix in the Time Domain

1. The convolutional codes can be generated by a generator matrix
multiplied by the information sequences.
2. Let u1 , u2 , . . . , uk are the information sequences and
v 1 , v 2 , . . . , v n the output sequences.
3. Arrange the information sequences as
u = (u1,0 , u2,0 , . . . , uk,0 , u1,1 , u2,1 , . . . , uk,1 ,

. . . , u1,ℓ , u2,ℓ , . . . , uk,ℓ , . . .)
= (w0 , w1 , . . . , wℓ , . . .),

A Y. S. of
property Han
and the output sequences as
v = (v1,0 , v2,0 , . . . , vn,0 , v1,1 , v2,1 , . . . , vn,1 ,

. . . , v1,ℓ , v2,ℓ , . . . , vn,ℓ , . . .)
= (z 0 , z 1 , . . . , z ℓ , . . .)
4. v is called a codeword or code sequence.

5. The relation between v and u can characterized as
v = u · G,
where G is the generator matrix of the code.

A Y. S. of
property Han
6. The generator matrix is

 
G G1 G2 ··· Gm
 0 
G0 G1 ··· Gm−1 Gm
 
 
G=  ,
 G0 ··· Gm−2 Gm−1 Gm 

.. ..
 
. .
with the k × n submatrices

 
(1) (1) (1)
g1,ℓ g2,ℓ ··· gn,ℓ
 
 (2) (2) (2)
 g1,ℓ g2,ℓ ··· gn,ℓ


Gℓ = 
 .. .. ..
.

 . . . 
 
(k) (k) (k)
g1,ℓ g2,ℓ ··· gn,ℓ
(i)
7. The element gj,ℓ , for i ∈ [1, k] and j ∈ [1, n], are the impulse

A Y. S. of
property Han
response of the ith input with respect to jth output:

(i) (i) (i) (i) (i)
g j = (gj,0 , gj,1 , . . . , gj,ℓ , . . . , gj,m ).

A Y. S. of
property Han
Generator Matrix of the Binary (2, 1, 2) Convolutional

Code
-⊕P
iP bv 1 = (1010011)
6 PPP J
]
P J
u = (11101) r - r - -r J b v = (11 01 10 01 00 10 11)
³³ ^
³ ³³
-⊕)
³ bv 2 = (1101001)

A Y. S. of
property Han
v = u·G
 
11 10 11
 

 11 10 11 

 
= (1, 1, 1, 0, 1) · 
 11 10 11 

 

 11 10 11 

11 10 11
= (11, 01, 10, 01, 00, 10, 11)

A Y. S. of
property Han

Code
c v 1 = (1000)
c v 2 = (1100)
u1 = (10) s ¡¢- s s
¢ Z
¢
u2 = (11) s¢ - s - s Z
~⊕ c v = (0001)
Z
- 3
µ
¡
¡
¡
u = (11 01) v = (110 010 000 001)
(1) (2) (1) (2)

g 1 = 4 (octal)= (1, 0, 0), g 1 = 0 (octal), g 2 = 0 (octal), g 2 = 4

A Y. S. of
property Han
(1) (2)
(octal), g 3 = 2 (octal), g 3 = 3 (octal)
v = u·G
 
100 001 000
 
 010 001 001 
= (11, 01) · 
 


 100 001 000 

010 001 001
= (110, 010, 000, 001)

A Y. S. of
property Han
Generator Matrix in the Z Domain

1. According to the Z transform,
∞
X
ui ⊸ Ui (D) = ui,t Dt
t=0
X∞
vj ⊸ Vj (D) = vj,t Dt
t=0
∞
(i) (i)
X
gj ⊸ Gi,j (D) = gj,t Dt
t=0
2. The convolutional relation of the Z transform

Z{u ∗ g} = U (D)G(D) is used to transform the convolution of
input sequences and generator sequences to a multiplication in
the Z domain.
Pk
3. Vj (D) = i=1 Ui (D) · Gi,j (D).

A Y. S. of
property Han
4. We can write the above equations into a matrix multiplication:
V (D) = U (D) · G(D),
where
U (D) = (U1 (D), U2 (D), . . . , Uk (D))

V (D) = (V1 (D), V2 (D), . . . , Vn (D))
 
 
G(D) = 
 Gi,j 


A Y. S. of
property Han

Code
-⊕P
iP bv 1 = (1010011)
6 PPP J
]
P J
u = (11101) r - r - -r J b v = (11 01 10 01 00 10 11)
³³ ^
³ ³³
-⊕)
³ bv 2 = (1101001)

A Y. S. of
property Han
g1 = (1, 1, 1, 0, . . .) = (1, 1, 1),

g2 = (1, 0, 1, 0, . . .) = (1, 0, 1)
G1,1 (D) = 1 + D + D2
G1,2 (D) = 1 + D2
U1 (D) = 1 + D + D2 + D4
V1 (D) = 1 + D2 + D5 + D6
V2 (D) = 1 + D + D3 + D6

A Y. S. of
property Han

Code
c v 1 = (1000)
c v 2 = (1100)
u1 = (10) s ¡- s s
¢
¢ Z
¢
u2 = (11) s¢ - s - s Z
~⊕ c v = (0001)
Z
- 3
µ
¡
¡
¡
u = (11 01) v = (110 010 000 001)

A Y. S. of
property Han
(1) (2) (1)

g1 = (1, 0, 0), g 1 = (0, 0, 0), g 2 = (0, 0, 0)
(2) (1) (2)
g2 = (1, 0, 0), g 3 = (0, 1, 0), g 3 = (0, 1, 1)
G1,1 (D) = 1, G1,2 (D) = 0, G1,3 (D) = D
G2,1 (D) = 0, G2,2 = 1, G2,3 (D) = D + D2
U1 (D) = 1, U2 (D) = 1 + D
V1 (D) = 1, V2 (D) = 1 + D, V3 (D) = D3

A Y. S. of
property Han
Termination
1. The effective code rate, Reffective , is defined as the average
number of input bits carried by an output bit.
2. In practice, the input sequences are with finite length.
3. In order to terminate a convolutional code, some bits are
appended onto the information sequence such that the shift
registers return to the zero.
4. Each of the k input sequences of length L bits is padded with m
zeros, and these k input sequences jointly induce n(L + m)
output bits.
5. The effective rate of the terminated convolutional code is now
kL L
Reffective = =R ,
n(L + m) L+m

A Y. S. of
property Han
L
where L+m is called the fractional rate loss.
6. When L is large, Reffective ≈ R.
7. All examples presented are terminated convolutional codes.

A Y. S. of
property Han
Truncation
1. The second option to terminate a convolutional code is to stop
for t > L no matter what contents of shift registers have.
2. The effective code rate is still R.
3. The generator matrix is clipped after the Lth column:
 
G G1 · · · Gm
 0 
G0 G1 ··· Gm
 
 
.. ..
 
. .
 
 
 
Gc[L] =  G0 Gm ,
 
..
 
 .. 

 . . 

 

 G0 G1 

G0

A Y. S. of
property Han
where Gc[L] is an (k · L) × (n · L) matrix.

4. The drawback of truncation method is that the last few blocks of
information sequences are less protected.

A Y. S. of
property Han
Generator Matrix of the Truncation Binary (2, 1, 2)

Convolutional Code
-⊕P
iP b v 1 = (10100)
6 PPP J
]
P J
u = (11101) r - r - -r J b v = (11 01 10 01 00)
³³ ^
³ ³³
-⊕)
³ b v 2 = (11010)

A Y. S. of
property Han
v = u · Gc[5]
 
11 10 11
 

 11 10 11 

 
= (1, 1, 1, 0, 1) · 
 11 10 11 

 

 11 10 

11
= (11, 01, 10, 01, 00)

A Y. S. of
property Han
Generator Matrix of the Truncation Binary (3, 2, 2)

Convolutional Code
c v 1 = (10)
c v 2 = (11)
u1 = (10) s ¡- s s
¢
¢ Z
¢
u2 = (11) s¢ - s - s Z
~⊕ c v = (00)
Z
- 3
µ
¡
¡
¡
u = (11 01) v = (110 010)

A Y. S. of
property Han
v = u · Gc[2]
 
100 001
 
 010 001 
= (11, 01) · 
 


 100 

010
= (110, 010)

A Y. S. of
property Han
Tail Biting
1. The third possible method to generate finite code sequences is
called tail biting.
2. Tail biting is to start the convolutional encoder in the same
contents of all shift registers (state) where it will stop after the
input of L information blocks.
3. Equal protection of all information bits of the entire information
sequences is possible.
4. The effective rate of the code is still R.
5. The generator matrix has to be clipped after the Lth column and

A Y. S. of
property Han
manipulated as follows:
 
G0 G1 ··· Gm
 

 G0 G1 ··· Gm 

 .. .. 

 . . 

c
 
G̃[L] = 
 G0 Gm ,


.. .. 
. .
 
 Gm 
 
 . ..
 ..

 . G0 G1 

G1 · · · Gm G0
c
where G̃[L] is an (k · L) × (n · L) matrix.

A Y. S. of
property Han
Generator Matrix of the Tail Biting Binary (2, 1, 2)

Convolutional Code
-⊕P
iP b v 1 = (10100)
6 PPP J
]
P J
u = (11101) r - r - -r J b v = (11 01 10 01 00)
³³ ^
³ ³³
-⊕)
³ b v 2 = (11010)

A Y. S. of
property Han
c
v = u · G̃[5]
 
11 10 11
 

 11 10 11 

 
= (1, 1, 1, 0, 1) · 
 11 10 11 

 
 11 11 10 
 
10 11 11
= (01, 10, 10, 01, 00)

A Y. S. of
property Han
Generator Matrix of the Tail Biting Binary (3, 2, 2)

Convolutional Code
c v 1 = (10)
c v 2 = (11)
u1 = (10) s ¡- s s
¢
¢ Z
¢
u2 = (11) s¢ - s - s Z
~⊕ c v = (00)
Z
- 3
µ
¡
¡
¡
u = (11 01) v = (110 010)

A Y. S. of
property Han
c
v = u· G̃[2]
 
100 001
 
 010 001 
= (11, 01) · 
 

 001 100 
 
001 010
= (111, 010)

A Y. S. of
property Han
FIR and IIR systems
- b v 1 = (111)
J
]
J
r-⊕ xr- r - -r Jb v = (11 10 11)
u = (111) 6 ^
⊕?
¾ r
-⊕? b v = (101) 2
1. All examples in previous slides are finite impulse response (FIR)

systems that are with finite impulse responses.
2. The above example is an infinite impulse response (IIR) system
that is with infinite impulse response.

A Y. S. of
property Han
3. The generator sequences of the above example are

(1)
g1 = (1)
(1)
g2 = (1, 1, 1, 0, 1, 1, 0, 1, 1, 0, . . .).
(1)
4. The infinite sequence of g2 is caused by the recursive structure
of the encoder.
5. By introducing the variable x, we have
xt = ut + xt−1 + xt−2
v2,t = xt + xt−2 .
Accordingly, we can have the following difference equations:
v1,t = ut
v2,t + v2,t−1 + v2,t−2 = ut + ut−2

A Y. S. of
property Han
6. We then apply the z transform to the second equation:

∞
X ∞
X ∞
X ∞
X ∞
X
v2,t Dt + v2,t−1 Dt + v2,t−2 Dt = ut D t + ut−2 Dt ,
t=0 t=0 t=0 t=0 t=0
V2 (D) + DV2 (D) + D2 V2 (D) = U (D) + D2 U (D).
7. The system function is then

1 + D2
V2 (D) = U (D) = G12 (D)U (D).
1 + D + D2
8. The generator matrix is obtained:
³ ´
2
G(D) = 1 1+D+D 1+D .
2

A Y. S. of
property Han
9.
V (D) = U (D) · G(D)

³ ´
2
= 1 + D + D2
¡ ¢
1+D
1 1+D+D 2
= (1 + D + D2 1 + D2 )
10. The time diagram:
t σt σ t+1 v1 v2
0 00 10 1 1
1 10 01 1 0
2 01 00 1 1

A Y. S. of
property Han
State Diagram
1. A convolutional encoder can be treated as a finite state machine.
2. The contents of the shift registers represent the states. The
output of a code block v t at time t depends on the current state
σ t and the information block ut .
3. Each change of state σ t → σ t+1 is associated with the input of
an information block and the output of a code block.
4. The state diagram is obtained by drawing a graph. In this graph,
nodes are possible states and the state transitions are labelled
with the appropriate inputs and outputs (ut /v t ). In this course
we only consider the convolutional encoder with state diagrams
that do not have parallel transitions.
5. The state of the encoder can be expressed as k-tuple of the

A Y. S. of
property Han
memory values:
σ t = (u1,t−1 . . . u1,t−K1 , u2,t−1 . . . u2,t−K2 , . . . , uk,t−1 . . . uk,t−Kk ).
6. The state sequence is defined as
S = (σ 0 , σ 1 , . . . , σ t , . . .),
where σ 0 is the initial state of the encoder.

7. If no parallel transmissions exists then there is a one-to-one
correspondence between code sequences and state sequences.

A Y. S. of
property Han
State Diagram: Example
-⊕P
iP bv 1 = (1010011)
6 PPP J
]
P J
u = (11101) r - r - -r J b v = (11 01 10 01 00 10 11)
³ ³³ ^
³
³³
-⊕) bv 2 = (1101001)

A Y. S. of
property Han
10
1/11 1/01
0/00 1/00 0/10 1/10

00 11
01
0/11 0/01

A Y. S. of
property Han
Code Trees of convolutional codes

1. A code tree of a binary (n, k, m) convolutional code presents
every codeword as a path on a tree.
2. For input sequences of length L bits, the code tree consists of
(L + m + 1) levels. The single leftmost node at level 0 is called
the origin node.
3. At the first L levels, there are exactly 2k branches leaving each
node. For those nodes located at levels L through (L + m), only
one branch remains. The 2kL rightmost nodes at level (L + m)
are called the terminal nodes.
4. As expected, a path from the single origin node to a terminal
node represents a codeword; therefore, it is named the code path
corresponding to the codeword.

1/10 q0/01 q0/11 q
A property of MVG_OMALLOOR 1/10 q
0/01 q0/11 q0/00 q
1/10 q
1/00 q0/10 q0/11 q
0/01 q
0/11 q0/00 q0/00 q
1/01 q
1/01 q0/01 q0/11 q
1/00 q
0/10 q0/11 q0/00 q
0/01 q
1/11 q0/10 q0/11 q
0/11 q
0/00 q0/00 q0/00 q
1/11 q
1/10 q0/01 q0/11 q
1/01 q
0/01 q0/11 q0/00 q
1/00 q
1/00 q0/10 q0/11 q
0/10 q
0/11 q0/00 q0/00 q
0/10 q
1/01 q0/01 q0/11 q
1/11 q
0/10 q0/11 q0/00 q
0/11 q
1/11 q0/10 q0/11 q
0/00 q
0/00 q0/00 q0/00 q
q 1/10 q0/01 q0/11 q
1/10 q
0/01 q0/11 q0/00 q
1/01 q
1/00 q0/10 q0/11 q
0/01 q
0/11 q0/00 q0/00 q
1/11 q
1/01 q0/01 q0/11 q
1/00 q
0/10 q0/11 q0/00 q
0/10 q
1/11 q0/10 q0/11 q
0/11 q
0/00 q0/00 q0/00 q
0/00 q
1/10 q0/01 q0/11 q
1/01 q
0/01 q0/11 q0/00 q
1/11 q
1/00 q0/10 q0/11 q
0/10 q
0/11 q0/00 q0/00 q
0/00 q
1/01 q0/01 q0/11 q
1/11 q
0/10 q0/11 q0/00 q
0/00 q
1/11 q0/10 q0/11 q
0/00 q
0/00 q0/00 q0/00 q
level 0 1 2 3 4 5 6 7
A Y. S. of
property Han
Trellises of Convolutional Codes

1. a code trellis as termed by Forney is a structure obtained from a
code tree by merging those nodes in the same state.
2. Recall that the state associated with a node is determined by the
associated shift-register contents.
3. For a binary (n, k, m) convolutional code, the number of states at
K
Pk
levels m through L is 2 , where K = j=1 Kj and Kj is the
length of the jth shift register in the encoder; hence, there are
2K nodes on these levels.
4. Due to node merging, only one terminal node remains in a trellis.
5. Analogous to a code tree, a path from the single origin node to
the single terminal node in a trellis also mirrors a codeword.

A Y. S. of
property Han
si 10 si 10 si 10 si
3 3 3 3
¡ A01 ¡ A01 ¡ A01 ¡ A01
¡ ¡ ¡ ¡
01A 01A 01A A
¡
01
¡ ¡ ¡
si
1 1 A
si 1 A
si 1 A
si 1 A
si
10 A 10 A 10 A 10 A
¢ @ ¢ @ ¡ @ ¡¢ @ ¡¢ @
10
¢
¢ @¢ 00 ¡ @A¢ 00 ¡ @A¢ 00 ¡ @A¢ @A
¢ ¢@si¡ ¢@A i ¡ ¢@A i ¡ ¢@A i @A i
2 11 s2 11 s2 11 s2 11 s2 11
¢ ¢ @ ¢ @ ¢ @ ¢ @ @
Origin ¢ ¢ ¢@ ¢@ ¢@ Terminal
@ @
node11i¢ 00 11i ¢ 00 11i ¢ 00@ 11i¢ 00@ 11i¢ 00@ i 00@ i 00@node
s0 s0 s0 s0 s0 s0 s0 si
0
level 0 1 2 3 4 5 6 7

A Y. S. of
property Han
Column Distance Function

1. Let v (a,b) = (va , va+1 , . . . , vb ) denote a portion of codeword v,
and abbreviate v (0,b) by v (b) . The Hamming distance between
the first rn bits of codewords v and z is given by:
rn−1
X
¡ ¢
dH v (rn−1) , z (rn−1) = (vi + zi ).
i=0
2. The Hamming weight of the first rn bits of codeword v thus

equals dH (v (rn−1) , 0(rn−1) ), where 0 represents the all-zero
codeword.
3. The column distance function (CDF) dc (r) of a binary (n, k, m)
convolutional code is defined as the minimum Hamming distance
between the first rn bits of any two codewords whose first n bits

A Y. S. of
property Han
are distinct, i.e.,

© ª
dc (r) = min dH (v (rn−1) , z (rn−1) ) : v (n−1) 6= z (n−1) for v, z ∈ C ,
where C is the set of all codewords.

4. Function dc (r) is clearly nondecreasing in r.
5. Two cases of CDFs are of specific interest: r = m + 1 and r = ∞.
In the latter case, the input sequences are considered infinite in
length. Usually, dc (r) for an (n, k, m) convolutional code reaches
its largest value dc (∞) when r is a little beyond 5 × m; this
property facilitates the determination of dc (∞).
6. Terminologically, dc (m + 1) and dc (∞) (or dfree in general) are
called the minimum distance and the free distance of the
convolutional code, respectively.
7. When a sufficiently large codeword length is taken, and an

A Y. S. of
property Han
optimal (i.e., maximum-likelihood) decoder is employed, the

error-correcting capability of a convolutional code is generally
characterized by dfree .
8. In case a decoder figures the transmitted bits only based on the
first n(m + 1) received bits (as in, for example, the majority-logic
decoding), dc (m + 1) can be used instead to characterize the code
error-correcting capability.
9. As for the sequential decoding algorithm that requires a rapid
initial growth of column distance functions, the decoding
computational complexity, defined as the number of metric
computations performed, is determined by the CDF of the code
being applied.

A Y. S. of
property Han
Path Enumerators
2 10
W W
1 1 W W
00 11
2 01
W W
1. The power of the operator W corresponds to the Hamming

weight of the code block associated with the transition.

A Y. S. of
property Han
2. We consider that all possible code sequences of the code start at

time t = 0 and state σ 0 = (00), and run through an arbitrary
state sequence S = (σ 0 , σ 1 , . . .).
3. The Hamming weight of the code sequence
S = (00, 10, 11, 01, 00, 00, . . .)
is 6 since
W 2W W W 2 = W 6.
4. The calculation of the path enumerators is based on a subset of
the set of all possible code sequences.
5. All sequences of this subset leave the zero state at time t = 0 and
remain there after reaching it again.
6. The above sequences have the state sequences
S = (0, σ 1 , σ 2 , . . . , σ ℓ−1 , 0, 0, . . .),

A Y. S. of
property Han
with σ t 6= 0 for t ∈ [1, ℓ − 1].

7. ℓ must be greater than m since the zero state cannot be reached
before m + 1 transitions.
8. A procedure from signal flow theory is used to calculate the path
enumerator.

A Y. S. of
property Han
00
2
W
X 1
10
W
1 W W
11
X 2
X 3
01
W
2
W
00

A Y. S. of
property Han
1. If the variables Xi for i ∈ [1, 3] are assigned to the nodes between

source and sink as shown in the figure then we have
X1 = X3 + W 2
X2 = W X1 + W X2
X3 = W X1 + W X2
and
T (W ) = W 2 X3 .
2. Each Xi is obtained by adding up all branches that go into the

node.
3. The weights of the branches are multiplied by the variable and
are assigned to the original node.

A Y. S. of
property Han
4. By solving the above equations, we have

W5
T (W ) = = W 5 + 2W 6 + · · · + 2j W j+5 + · · · ,
1 − 2W
where T (W ) is expanded in a series with positive powers of W .
5. In this series, the coefficients are equal to the number of code
sequences with corresponding weight. For example, there are 2j
code sequences with weight j + 5.
6. The power of the first member of this series corresponds to the
free distance of the code. In the previous example, it is 5.

A Y. S. of
property Han
Systematic Matrix
1. An important subclass of convolutional codes is the systematic
codes, in which k out of n output sequences retain the values of
the k input sequences. In other words, these outputs are directly
connected to the k inputs in the encoder.
2. A generator matrix of a systematic code has the following
structure:
 
1 G1,k+1 (D) · · · G1,n (D)
 .. .. 
1 . .
 
 
G(D) =   
.. 

 . 

1 Gk,k+1 (D) Gk,n (D)
= (I k | R(D)),

A Y. S. of
property Han
where I k is the k × k identity matrix and R(D) is a k × (n − k)

matrix with rational elements.
3. The input sequences do not need to be equal to the first k output
sequences. Any permutation is allowed.

A Y. S. of
property Han
Catastrophic Generator Matrix

1. A catastrophic matrix maps information sequences with infinite
Hamming weight to code sequences with finite Hamming weight.
2. For a catastrophic code, a finite number of transmission errors
can cause an infinite number of errors in the decoded information
sequence. Hence, theses codes should be avoided in practice.
3. It is easy to see that systematic generator matrices are never
catastrophic.
4. It can be shown that a loop in the state diagram that produces a
zero output for a non-zero input is a necessary and sufficient
condition for a catastrophic matrix. A loop is defined as a closed
path in the state diagram.
5. The (111) state has a loop associated with 1/00 in the following
example.

A Y. S. of
property Han
-⊕ b v1
6 J]
J
r - r - r - -r Jb v
u
^
-⊕? -⊕? -⊕? b v 2

A Y. S. of
property Han
Punctured Convolutional Codes

1. Punctured convolutional codes are derived from the mother code
by periodically deleting certain code bits in the code sequence of
the mother code:
P
(nm , km , mm ) −→ (np , kp , mp ),
where P is an nm × kp /km puncturing matrix with elements

pij ∈ {0, 1}.
2. A 0 in P means the deletion of a bit and a 1 means that this
position is remained the same in the puncture sequence.
3. Let p = nm kp /km be the puncture period (the number of all
elements in P ) and w ≤ p be the number of 1 elements in P . It

A Y. S. of
property Han
is clear that w = np . The rate of the puncture code is now

kp km p 1 p
Rp = = = Rm .
np nm w w
4. Since w ≤ p, we have Rm ≤ Rp .
5. w can not be too small in order to assure that Rp ≤ 1.

A Y. S. of
property Han
Example of the Punctured Convolutional Code

1. Consider the (2, 1, 2) code with the generator matrix
 
11 10 11
 
G = 
 11 10 11 .

.. ..
 
. .
2. Let
 
1 0 0 0 0 1
P =  
1 1 1 1 1 0
be the puncture matrix.

kp
3. The code rate of the punctured code is np = 67 .
4. If ℓth code bit is removed then the respective column of the

A Y. S. of
property Han
generator matrix of the mother code must be deleted.

A Y. S. of
property Han
5.
11 01 01 01 01 10 11 01 01 01 01 10 11 01
11 10 11
11 10 11
11 10 11
11 10 11
11 10 11
11 10 11
11 10 11
11 10 11
11 10 11
11 10 11
11 10 11
11 10 11

A Y. S. of
property Han
6. We can obtain the generator matrix of the punctured code by

deleting those un-bold columns in the above matrix:
 
1 1 0 1
 
1 0 1
 
 
 

 1 0 1 

 

 1 0 1 

 
 1 1 1 1 
Gp =  .
 

 1 1 0 1 

1 1 0 1
 
 
 
1 0 1
 
 
 

 1 0 1 
.. ..
 
. .

A Y. S. of
property Han
7. The above punctured code has mp = 2.

8. Punctured convolutional codes are very important in practical
applications, and are used in many areas.

A Y. S. of
property Han
References
[1] M. Bossert, Channel Coding for Telecommunications. New York,
NY: John Wiley and Sons, 1999.

Decoding Algorithms and Error Probability

Bounds for Convolutional Codes
EE 229B - Error Control Coding Project
Hari Palaiyanur
May 13, 2005
1 Introduction
Error control codes are a way of combatting uncertainty in the transmission
of information from one point to another. Information theory shows the exis-
tence of codes that allow for arbitrarily reliable communication when the rate
of transmission is less than the channel capacity. However, it doesn’t provide
schemes that are practical to achieve this. Error control coding strives to find
codes that provide for good performance in terms of error probability, but also
low complexity in encoding and decoding.
The first major class of codes designed for this purpose was linear block
codes, having been invented in the late 1940’s and early 1950’s. Linear block
codes take a fixed number, k, of input symbols and encode them into a larger,
fixed number, n, of output symbols in some linear fashion. Some of the major
classes of linear block codes we have discussed in class are Hamming codes,
Reed-Muller codes, BCH codes and Reed-Solomon codes. Each of these codes
introduces redundancy into a sequence of input symbols in a controlled manner
to improve the resistance of the transmitted codewords to noise. However, each
block has no dependence on other blocks.
Convolutional codes were first introduced in the mid-1950’s as an alternative
to block codes. At each time step, a convolutional code takes k input symbols
and encodes them into n output symbols. However, we allow for each block to
depend on other blocks of k inputs in a manner that can be implemented as
a finite state machine. We will assume that each block is allowed to depend
only on previous blocks of inputs, and not future blocks of inputs. The use
of memory in convolutional codes makes the code structure more complicated
than a block code and increases complexity in the encoder by the use of shift
registers storing previous inputs, but the additional structure has advantages in
decoding.
A convolutional code can be represented by a trellis. The trellis describes
the codewords of the code as paths through a directed graph. If we weight the
edges of this trellis with log-likelihoods, then maximum-likelihood decoding can
1
be viewed as the problem of finding the shortest path from the root or start
of the trellis to the end. Maximum likelihood (ML) decoding of convolutional
codes is accomplished in this manner through the well known Viterbi Algorithm
[1].
While the Viterbi Algorithm provides the optimal solution, it may not be
practical to implement for certain code parameters. Sequential decoding algo-
rithms, such as the Stack Algorithm and Fano Algorithm can be useful in some
cases where the Viterbi algorithm is not practical.
The performance criteria we use when evaluating these various algorithms
will be probability of error. In an unterminated convolutional code used over a
noisy channel, the probability of error will tend to 1, so instead we will focus
on the probability of error per unit time in most cases.
The purpose of this project was to become familiar with several decoding
procedures and then evaluate their performance in terms of this probability of
error per unit time for randomly generated convolutional codes. In this process,
roughly speaking, we come across results concerning convolutional codes using
ML decoding, block codes derived from terminated convolutional codes using
ML decoding, and convolutional codes using sequential decoding. (The code
itself doesn’t use the decoding process, but the probability of error depends
both on the code and the decoding process chosen.) We will mostly cover
results, but since my personal goal for the project was to become familiar with
proof techniques used for ML decoding analysis, we give a few detailed proofs
of theorems. The main sources from which these results are drawn are Forney’s
papers on ML decoding [2] and sequential decoding [3] and Zigangirov’s book
[4] on convolutional codes. For a quick, general introduction to convolutional
codes, Lin and Costello’s textbook[5] is useful.
2 Description of Convolutional Codes

We begin by describing the convolutional code as Forney [2] does, ultimately
leading to the trellis description of convolutional codes. At any time step, the
convolutional encoder is assumed to have k inputs which can each take on q
values, and n outputs. Hence, there are M = q k possible inputs at any given
time. Perhaps the simplest description of a convolutional code is with a tree.
We note that the tree is really a description of the code for a specific encoder.
Figure 1 shows an example of a convolutional code represented as a tree.
In this example, at each time step, one input bit comes into the encoder, and
two output bits are given out. A node’s depth in the tree denotes the time at
which the node occurs. A node branches upwards if the input bit is a ’0’, or
downwards if the input bit is a ’1’. The outputs of the encoder at time t are the
labels of edges connecting nodes at time t−1 to nodes at time t. The output bits
depend not only on the present input bit, but also the previous input bit. For
the code depicted, in the notation of Lin and Costello [5], G(D) = [1 1 + D].
This signifies that the first output bit is equal to the input bit at the present
time and the second output bit is equal to the sum of the input bits at the
2
previous time and the present time. So for this code, q = 2, k = 1, M = 2 and
n = 2.
00
00
11
01
0 00 11
10
···
00
10 01
1 11
01
10
10
Figure 1: An example of a convolutional code represented by a tree.
In general, at time zero, there is only the root node, at time 1, there are M
nodes, at time 2, there are M 2 nodes and so on. Now we note that we have
required the convolutional code encoder to be implemented with a finite state
machine. Eventually there is a depth in the tree when the number of nodes at
that depth exceeds the number of possible states in the encoder. At this depth
and afterwards, the tree contains redundant information about the encoder.
This is because we again require that future outputs depend only on the state
and future inputs. Hence, all we really need to keep track of for purposes of
knowing the output of the encoder is the state of the encoder at each time. The
code trellis is formed from the code tree by merging nodes corresponding to the
same encoder state.
Figure 2 shows the trellis for the example in Figure 1. As can be seen,
there are two possible states, corresponding to the fact that the present output
depends on the previous input as well as the present input. Define νi to be the
memory of the ith input inPthe convolutional encoder. Also, define the overall
k
constraint length as ν0 = i=1 νi . We only consider νi = ν so that ν0 = kν,
because in general, it is harder to prove results about convolutional codes in
which the memories of the inputs are different. We define the state at time t
to be St = (Xt−1 , Xt−2 , . . . , Xt−ν ) where Xt is the k-tuple of inputs at time
t. After time ν, each state in the trellis has M predecessors and M successors.
3
The M successors of any given state all have the same M predecessors.
10 10
S1
11 11 11
···
01 01
00 00 00
S0
t=0 t=1 t=2 t=3 ···
Figure 2: Example of a trellis for a convolutional code.
Next, we note that a block code can be thought of as a trivial block code
with no memory. Also, we can form a block code from a convolutional code by
terminating it at some time τ . In such a terminated convolutional code, there
are a total of K = kτ input symbols and N = nτ output symbols. Sometimes,
in terminated convolutional codes, it is necessary to end all input sequences with
a particular length ν sequence to ’clear’ the states of the encoder. In this case,
the terminated convolutional code has M0 = M τ −ν total codewords. Below we
define the rates of a convolutional code and a terminated convolutional code as
r and R respectively. Finally, we note that we can still use a trellis to describe
a terminated convolutional code.
ln M
r =
n
ν
θ ,
τ
1
R = ln M0
N
= r(1 − θ)
3 ML Decoding and the Viterbi Algorithm

We will only discuss decoding for the discrete memoryless channel (DMC) model.
Figure 3 shows the channel, with input yi and output zi . We assume the channel
input symbol, yi , is from the set {1, 2, . . . , L} and the channel output symbol, zi ,
is from the set {1, 2, . . . , J}. The channel is described by a probability transition
4
matrix pjl = P r(zi = j|yi = l). Inherent in this notation is the assumption that
the channel is time-invariant.
yi Discrete Memoryless zi
Channel
Figure 3: The discrete memoryless channel model.
In a terminated convolutional code, a codeword y consists of N = nτ channel

inputs, and the received word z also consists of N = nτ channel outputs. A
maximum-likelihood decoder, by definition, chooses the ŷ ∈ C that makes the
received word z most likely to occur, where C is the set of codewords. That is,
if L(y) = P r(z|y), an ML decoder chooses the decoded word ŷ as follows.
Γ(y) , − ln P r(z|y)
ŷ = arg maxy∈C L(y)
= arg miny∈C Γ(y)
Sometimes, a tie will occur in choosing ŷ, in which case we can pick one of
the best choices randomly or deterministically. Also, we note that Γ is actually
the negative log-likelihood, but we will refer to it freely as the log-likelihood.
The Viterbi Algorithm allows us to find this optimal codeword efficiently.
To do this we first note that,
Γ(y) = − ln P r(z|y)
N
X
= − ln P r(zi |yi )
i=1
Xτ nt
X
= − ln P r(zi |yi )
t=1 i=n(t−1)+1
The total log-likelihood for a potential codeword is the sum of the log-
likelihoods of each branch of the codeword. So in the code trellis, we can label
each branch with the log-likelihood of channel outputs on that branch. Now, as
we go through the trellis for each codeword, we add up these ’lengths’ along the
way until we go from the start to the end. The problem of finding the optimal
codeword for ML decoding now becomes the problem of finding the shortest
path through the trellis. The Viterbi Algorithm provides a way to do this and
it is summarized below.
1. Assign length 0 to the initial node and set the ’time’ t = 0.
5
2. For each node at time t + 1 find, for each of its predecessors, the sum
of the predecessor’s length with the length of the branch connecting the
predecessor at time t with the node at time t + 1. Assign the minimum
of these sums as the ’length’ of the node, and label the node with the
shortest path to it from the root. If there is a tie, choose one of the paths
randomly.
3. If t = τ , stop. The decoded codeword is the shortest path from the root
to a node at time τ . If t is not τ , go to step 2.
A nice feature of the Viterbi Algorithm is that its complexity is proportional

to M ν , which is the number of states in the encoder. Hence, we can increase the
length of the terminated code and expect just a linear increase in complexity.
This is while increasing the number of codewords exponentially.
Now that we have described the Viterbi Algorithm, we can set up the analysis
of probability of error for ML decoding of convolutional codes.
4 Error Events and Random Trellis Codes

An error occurs in the Viterbi algorithm when an incorrect path has a shorter
length than the correct path. This happens either by poor design of the code,
or because the channel was very noisy and transitioned symbols in an unlikely
manner. An error event is defined to be any finite period during which the ML
path and the correct path are unmerged. The error event starts at time t if the
last common node of the correct path and the ML path was at time t in the
trellis.
As stated in the introduction, we are not interested particularly in the prob-
ability that the an error event occurs, because as the code length increases, this
quantity will go to 1. Rather, we are interested in the frequency with which
errors occur. That is, we are actually interested in the probability that an error
event starts at any particular time. This is different from the probability of bit
error, for example, because even though a branch on the ML path may not lie
on the correct path, some of the symbols along the two branches may be the
same.
A possible error event is any path through the code trellis that begins and
ends on the correct path and touches it nowhere in between. A possible error
event, therefore, is just a portion of the code trellis. An actual error event
is always a possible error event, but depending upon the realization of the
channel outputs, most possible error events will not be actual error events. The
minimum length of any possible error event is ν + 1 because the memory of the
encoder is ν time steps. Define the time-t incorrect subset St as consisting of
all possible error events that start at time t. Define Et to be the event that an
actual error event starts at time t.
Next, we define the kind of convolutional codes about which we will state
and prove a few results. An (M, ν, n) random trellis code is code described
6
by a trellis corresponding to a shift register of length ν, with M -state stor-

age elements, when any M -ary sequence is the input and every channel input
(codeword symbol) on every branch is chosen independently and identically at
random according to some probability distribution p = {pk , 1 ≤ k ≤ L}. The
rate of such a code is r = n1 ln M nats/symbol.
A random (M, ν, n, τ ) terminated trellis code is a random (M, ν, n) trellis
code in which the last nν inputs to the encoder are fixed. The rate for such a
code, as discussed before, is R = r(1 − τν ).
These codes are in general, non-linear and time-varying, but just like the
channel coding theorem using random block codes, the symmetry given by the
random choice of channel inputs will lead to a tractable and usable bound on
the probability of error per unit time. We are now in a position to describe
some main results and prove a few of them.
5 Error Probability Bounds for ML Decoding

The first theorem is a bound on P r(Et ) derived by Yudkin and Viterbi.
Theorem 1 Over the ensemble of random (M, ν, n) trellis codes using ML de-
coding,
P r(Et ) ≤ K1 exp[−nνE0 (ρ)]

where K1 is a constant independent of ν, E0 (ρ) is Gallager’s function [6],
X·X 1
¸1+ρ
1+ρ
E0 (ρ) = − ln pl pjl
j l
E0 (ρr )
and 0 ≤ ρ ≤ 1 and ρ < ρr , where ρr is related to r by r = ρr .
Since this bound is an average over all codes in the ensemble, it is true
for at least one code in the ensemble. Assuming for the moment that the
theorem is true, we can quickly prove a corollary about block codes derived
from convolutional codes.
Corollary 1 Let C be the capacity of the channel. Then, for 0 < θ ≤ 1,

0 ≤ r ≤ C and ² > 0, there exists a block code of (large enough) length N and
rate R = r(1 − θ) such that the probability of block error P r(E) satisfies
P r(E) ≤ K1 N exp[−N θ(e(r) − ²)]
Proof: Select parameters n and τ so that τ = N n . Then, for ν = θτ , select

a (M, ν, n) trellis code from the random ensemble which satisfies the bound in
Theorem 1. We can select these parameters and avoid any complications with
integer effects so that, after taking a union bound of the probability of error per
unit time over the (τ − ν) information times,
7
P r(E) ≤ (τ − ν)K1 exp[−nν(e(r) − ²)]

≤ N K1 exp[−nν(e(r) − ²)]
= N K1 exp[−N θ(e(r) − ²)]
¤
Now, we give an outline of a proof of Theorem 1. The proof involves several

steps starting with a Chernoff bound on the probability of error, followed by
’configuration-counting’ and then the use of a bounding procedure found in
Gallager [6].
Proof of Theorem 1: We assume a random ensemble of trellis codes as stated
in the theorem. Recall the definition of Et as the event that an actual error
event starts at time t. Let y denote the correct path through the code trellis.
Then, Et happens only if there is a path y0 in St that re-merges with y at some
time t + κ, κ > ν AND
− ln[P r(z|y0 )]t+κ

t ≤ − ln[P r(z|y)]t+κ
t
where the indices on the brackets indicate restriction of the paths to within
those times. Now define
· ¸t+κ
0 P r(z|y0 )
Γ(y ) , ln
P r(z|y) t
Restating in terms of Γ, Et occurs only if there is some y0 in St for which

Γ(y0 ) ≥ 0. Now consider the following quantity,
· X ¸ρ
Tt (α, ρ) , exp[αΓ(y0 )] 0 ≤ α, 0 ≤ ρ ≤ 1
y0 ∈St
It can be seen that if Γ(y0 ) ≥ 0 for at least one y0 in St , then Tt (α, ρ) ≥ 1.

Hence, we can use Tt to bound P r(Et ). That is,
· ¸
P r(Et ) ≤ E Tt (α, ρ)
where the expectation is taken over all randomness in the codes and the channel.
Next, we need to bound the number of paths that can diverge at time t and
re-merge at time t + κ. Hence we are interested in the number of possible error
events in the configuration, as Forney calls it, of paths that diverge at time t
and re-merge at time t + κ. Define Ctκ to be the set of all possible error events
starting at time t and ending at time t + κ. Figure 4 shows a typical element
of Ctκ along with the correct path. Since the last ν time steps’ inputs must be
the same as those of the correct path to re-merge with it, a bound on the size
8
of the configuration is |Ctκ | ≤ M κ−ν = enr(κ−ν) . Now we can partition the y0

in St into the y0 in each Ctκ , for κ P
> ν. P
Applying Jensen’s inequality (( i ai )ρ ≤ i (ai )ρ , for ρ ≤ 1) to Tt , we get
· X
∞ X ¸ρ
0
Tt (α, ρ) = exp[αΓ(y )]
κ=ν+1 y0 ∈Ctκ
∞ · X
X ¸ρ
≤ exp[αΓ(y0 )]
κ=ν+1 y0 ∈Ctκ
· X ¸ρ
0
Ttκ (α, ρ) , exp[αΓ(y )]
y0 ∈Ctκ
∞
X · ¸
P (Et ) ≤ E Ttκ (α, ρ)
κ=ν+1
Then, by some algebraic manipulations similar to the channel coding theo-

1
rem proof of [6] and setting α = 1+ρ , we get the following lemma.
Lemma 1
· ¸
E Ttκ (α, ρ) ≤ |Ctκ |ρ exp[−nκE0 (ρ)] 0≤ρ≤1
time t time t + κ
Figure 4: The configuration of the correct path and the configuration of a

possible error event of length κ starting at time t.
9
To finish the proof we have,

∞
X · ¸
P r(Et ) ≤ E Ttκ (α, ρ)
κ=ν+1
X∞
≤ |Ctκ |ρ exp[−nκE0 (ρ)]
κ=ν+1
X∞
≤ M (κ−ν)ρ exp[−nκE0 (ρ)]
κ=ν+1
X∞
≤ enr(κ−ν)ρ exp[−nκE0 (ρ)]
κ=ν+1
≤ K1 exp[−nνE0 (ρ)]
E0 (ρ)
for r <
ρ
exp[−n(E0 (ρ) − ρr)]
K1 =
1 − exp[−n(E0 (ρ) − ρr)]
Hence, Theorem 1 says that the probability or error per unit time is exponen-
tially decreasing in constraint length. This shows explicitly that the increased
complexity in the structure of the code helps in decoding of the code.
A few more bounds on error probability for ML decoding are stated before
we move on to sequential decoding algorithms.
Theorem 2 For any probability vector on channel inputs p and any ² > 0,
there exists a block code of length N and rate R with block error probability,
P r(E), so that
P r(E) ≤ K1 N exp[−N (E(R) − ²)]
where E(R) = max0≤ρ≤1 E0 (ρ) − ρR.
The random block coding error exponent, E(R), is known to be tight for
rates close to channel capacity. The proof of Theorem 2 also uses terminated
convolutional codes. Interestingly, Theorem 2 says that one can get an optimal
block code from an appropriately terminated convolutional code.
The last sample result concerns bounded delay decoding of convolutional
codes.
Theorem 3 If code symbols at time t must be decided upon by time t + κ, then

the additional probability of error caused by deciding early is
P r(Ea ) ≤ exp[−nκE(r)]
where E(r) is the block code exponent evaluated at the rate of the convolutional
code.
10
This result comes essentially from the proof of the channel coding theorem
with block codes. An incorrect symbol decision at time t would lead the Viterbi
decoder on a path which is statistically independent of the correct path, and so
the analysis of Gallager[6] can be used, with the block length equal to nκ.
6 Sequential Decoding and the Stack Algorithm

In the Viterbi Algorithm, all paths through the code trellis are examined to make
an ML decision on the sent codeword. For larger constraint lengths, which we
saw led to lower error probability, this can be inefficient because the complexity
of the Viterbi decoder is exponential in ν. Sequential decoding algorithms were
developed heuristically before the Viterbi Algorithm was recognized to give the
ML decision. The idea is to search paths on a branch by branch basis, following
the best path first, in order to avoid searching all paths. The ’metric’ or ’length’
on the branches can be the log-likelihoods of the branch along with some bias.
The bias essentially adjusts how aggressive the sequential decoder is in searching
deep into the code tree. The choice of metric affects the computation time of
the algorithm as well as its probability of error. The algorithm is done when
the next path to be extended is at the terminating level. First, let us state the
definition of sequential decoder given by Jacobs and Berlekamp [7].
A decoder is sequential only if it computes a subset of path metrics or lengths
in a sequential fashion, with each new path examined being an extension of a
previously examined path and the decision of which path to extend based only
on previously examined paths. In other words, a decoder is sequential if it
searches through the code tree going to nodes that are direct descendants of
previously examined nodes. The decision of which node to visit must be made
on the information collected about already visited nodes. It is possible, however,
for a sequential decoder to visit a node more than once.
There are many sequential algorithms based on various heuristics; two main
ones being the Fano Algorithm and the Stack Algorithm. The Stack Algorithm
is simpler to describe, but because it requires sorting of a list of numbers on
each iteration, the Fano Algorithm is preferred in many situations. The Stack
Algorithm was discovered independently by Zigangirov in 1966 and Jelinek in
1969. At a given time, assume there are N partial paths in the stack and y0 is
the one with the best metric. The Stack Algorithm is summarized below.
1. N = 1, y0 is the origin node.

2. Compute the metrics of the M successors of y0 as the sum of the branch
metric and the metric for y0 . Place each of the M partial paths onto the
stack, removing y0 . Update N .
3. Sort the stack based on the partial path metrics. The best partial path
is once again y0 . If y0 is at the terminating depth, then the algorithm is
done and y0 is the decoded codeword. Otherwise, return to 2 and iterate
again.
11
One variation on the Stack Algorithm is called the Stack-Bucket Algorithm

(SBA). In the SBA, rather than one stack, there are buckets that hold all the
partial paths with metrics confined to a certain range. Instead of sorting the
paths in a bucket, the bucket is used as a true ’push-pop’ stack without sorting
at each time. If the granularity of the buckets is small enough, then there is
very little difference in the performance of the Stack Algorithm and the SBA.
Another variation, which Forney[3] calls Algorithm A, takes merging into
account, by combining paths corresponding to the same state in the code trellis.
Before stating results about sequential decoding, we note that the amount
of computational effort expended by a sequential decoder is a random variable
depending on the codeword sent and the channel transitions. Much of the
research into sequential decoders has been into bounding the computational
effort. It turns out that the biasing term used in the metrics for each branch
will affect this computational effort and the probability of error. There is a
tradeoff between achieving a low probability of error and low computational
effort. The effect of the biasing term on these two performance parameters was
investigated by Jelinek[8].
7 Bounds for Sequential Decoding

First, a bound on probability of error is stated and then a lower bound on the
computational distribution.
Theorem 4 Let ρr related to r as r = E0ρ(ρr r ) . Then, on any DMC, for 0 ≤
ρ < min(ρr , 1) and using Algorithm A, with the appropriately chosen bias, the
probability of error per unit time satisfies
P r(Et ) ≤ K exp[−nνρr]
where K is a constant independent of ν.
The proof of this result is similar to the proof of Theorem 1, with an ad-
justment of the quantity Tt being bounded. By the properties of Gallager’s
function E0 , if 0 ≤ ρr ≤ 1, then E0 (1) = Rcomp ≤ r ≤ C and as ρ goes to ρr ,
we have asymptotically the same result in Theorem 1 as in this theorem about
sequential decoding. That is, for the high rate region, sequential decoding is
asymptotically as good as ML decoding.
The last result stated is one about the random variable of computation.
Theorem 5 Let C be the number of computations performed by any sequential
decoder. Then P [C > L] ' L−ρ where ρ is a parameter that depends on both
the channel and the rate of the code. That is, the distribution of the random
variable of computation is lower bounded by a Pareto distribution.
The impact of this theorem, proved by Jacobs and Berlekamp[7], is that there
always exists a moment of computation that is infinite. Hence, there is some
unavoidable probability that the sequential decoder will perform extremely long
searches and run into a buffer overflow in a real system.
12
8 Conclusion
In this survey paper, we first described a convolutional code as a directed graph
called a trellis, or a tree code. Then, we looked at the problem of ML decoding
over DMC channels and the Viterbi Algorithm as a solution to this problem. We
stated some results about probability of error per unit time for random trellis
codes and gave one detailed proof. We stated the result that one can get an
optimum block code from a properly terminated convolutional code. Then we
explained sequential decoding, the Stack Algorithm, and a few variants. Finally,
we stated a few results about sequential decoders, namely that for certain rates
they can be as ’good’ as ML decoders, and that they are always ’bad’ in the
sense that there exist moments of the random variable of computation that are
infinite.
I would like to thank Professor Anantharam for giving me and the rest of
the class the opportunity in the project to look into an area of coding theory in
depth when we didn’t have time to cover it in class.
13
References
[1] A.J. Viterbi, “Error Bounds for Convolutional Codes and an Asymptot-
ically Optimum Decoding Algorithm,” IEEE Trans. Information Theory,
IT-13:260-69, April 1967
[2] G.D. Forney Jr., “Convolutional Codes II: Maximum Likelihood Decoding,”
Information and Control, 25:222-66, July 1974.
[3] G.D. Forney Jr., “Convolutional Codes III: Sequential Decoding,” Infor-
mation and Control, 25:267-97, July 1974.
[4] R. Johannesson and K. Zigangirov, Fundamentals of Convolutional Coding,

Wiley-IEEE, New York, 1999.
[5] S. Lin and D.J. Costello Jr., Error Control Coding, Second Edition, Prentice
Hall, 2004.
[6] R.G. Gallager, Information Theory and Reliable Communication, John Wi-
ley, New York, 1968.
[7] I.M. Jacobs and E.R. Berlekamp, “A Lower Bound to the Distribution of
Compuation for Sequential Decoding,” IEEE Trans. Information Theory,
IT-13, 167-74.
[8] F. Jelinek, “Upper Bounds on Sequential Decoding Performance Parame-

ters,” IEEE Trans. Information Theory, IT-20, 227-239.
14
EE 229B ERROR CONTROL CODING Spring 2005
Lecture notes on the structure of convolutional codes

Venkat Anantharam
(based on scribe notes by Lawrence Ip and Xiaoyi Tang)
Warning : Use at your own risk ! These notes have not been sufficiently carefully screened.
1 Convolutional Codes
1.1 Introduction
Suggested reference for convolutional codes : “Fundamentals of Convolutional Coding” by Rolf
Johannesson and Kamil Zigangirov, IEEE Press, 1999.
An encoder for a binary block code takes a block of information bits and converts it into a block
of transmitted bits (a codeword). A binary convolutional encoder takes a stream of information
bits and converts it into a stream of transmitted bits, using a shift register bank. Redundancy for
recovery from channel errors is provided by transmitting more bits per unit time than the number
of information bits per unit time. Maximum likelihood decoding can be done using the Viterbi
algorithm; other decoding algorithms such as SOVA (soft output Viterbi algorithm) and the BCJR
algorithm are also commonly used. In practice the information stream is of finite duration and one
typically appends a few termination bits to the input stream to bring the shift register bank back
to the all zeros state, so that the convolutional code is in effect used as a very long block code.
Often convolutional codes are used as inner codes with burst error correcting block codes as
outer codes to form concatenated codes. Errors in Viterbi-like decoding algorithms for convolutional
codes tend to occur in bursts because they result from taking a wrong path in a trellis. The burst
error correcting capability of the outer code is used to recover from such burst error patterns in
the decoding of the inner code. See Section 15.6 on pp. 760 -761 of the text of Lin and Costello
(2nd edition) for a short discussion of this idea.
1.2 Formal definitions

Convolutional codes are linear codes over the field of one-sided infinite sequences. The symbols can
be from any field but we will just consider symbols from GF (2). For notational simplicity, and to
give an indication of how the more general case works, in the following we will write F for the field
GF (2). We begin with a few definitions.
Definition 1 Let ( )
∞
X
i
F ((D)) = xi D : xi ∈ F, r ∈ Z
i=r
be the set of binary Laurent series.
The D is a formal place holder to represent one step of delay. With addition and multiplication
defined in the obvious way F ((D)) becomes a field. Because the Laurent series are finite to the left,
multiplication is well defined (as then we don’t have to sum an infinite number of terms). All the
1
axioms of a field are easy to check except the existence of a multiplicative inverse. A multiplicative
inverse can be constructed explicitly by “long division”.
Definition 2 Let (∞ )
X
F [[D]] = xi D i : xi ∈ F, r ≥ 0
i=r
be the set of binary formal power series.
With addition and multiplication defined in the obvious way F [[D]] is a ring (in a ring, as
opposed to a field, we do not require the existence of a multiplicative identity and we do not
requrie the existence of multiplicative inverses for nonzero elements).
Definition 3 Let ( )
d
X
i
F [D] = xi D : xi ∈ F, 0 ≤ r ≤ d < ∞
i=r
be the set of binary formal polynomials.
With addition and multiplication defined in the obvious way F [D] is a ring.
Definition 4 Let
p(D)
F (D) = : p(D), q(D) ∈ F [D]
q(D)
be the set of binary rational functions.
With addition and multiplication defined in the obvious way F (D) is a field.
Note that we have
F [D] ⊂ F (D) ⊂ F ((D))
and
F [D] ⊂ F [[D]] ⊂ F ((D)).
Definition 5 Let
F k ((D)) = {(x1 (D), x2 (D), · · · , xk (D)) : xj (D) ∈ F ((D)), 1 ≤ j ≤ k} .
Definition 6 A rate R = k/n (k ≤ n) convolutional mapping is a linear transformation
τ : F k ((D)) → F n ((D))
defined by
τ (x(D)) = x(D)G(D)
where G(D) ∈ F (D)k×n (k × n matrices of rational functions) and G(D) has rank k. G(D) is
called a transfer function.
Definition 7 A rate R = k/n convolutional code is the image of a rate R convolution mapping.
2
Given a k × n transfer function G(D) and x(D) ∈ F k ((D)), then y(D) = x(D)G(D) ∈ F n ((D))
is the code sequence corresponding to the input sequence x(D).
Definition 8 Given x(D) ∈ F ((D)), its delay del(x(D)) is the smallest r for which xr = 1.
Definition 9 x(D) is called delay free if del(x(D)) = 0.
Definition 10 A rational function p(D)/q(D) is called realizable if q(D) is delay free.
Definition 11 G(D) ∈ F (D)k×n is called realizable if each of its elements is realizable. Such a
G(D) is called a generator matrix.
Definition 12 A generator matrix (a transfer function matrix that is realizable) is called delay
free if at least one of its elements is delay free.
Theorem 1 Any convolutional code C ⊆ F n ((D)) is the image of a convolutional mapping with a
transfer function matrix that is a delay free generator matrix.
Proof: Suppose C is the image of F k ((D)) under the mapping corresponding to G(D) ∈ F (D)k×n .
Write
G(D) = [gij (D)]k×n
where gij (D) = pij (D)/qij (D). Now write
gij (D) = D sij p̃ij (D)/q̃ij (D)
where p̃ij (0) = 1, q̃ij (0) = 1. Define s = mini,j sij and
Ĝ(D) = D −s G(D).
Then Ĝ(D) is a delay free generator and the image of F k ((D)) under the transformation corre-
sponding to Ĝ(D) is C. 2
In general a transfer function matrix may not be realizable, so any shift register bank that
implements it would need to be noncausal. The preceding result shows that any convolutional
code has a generator matrix ( a realizable transfer function matrix), which can now be built with
a causal shift register bank. Further we can ensure that there is no “unnecessary delay” (this the
content of the delay free condition). The following result shows that we can actually even use any
convolutional code with feedforward shift register banks (i.e. no need for feedback, or, in other
words, a polynomial transfer function matrix).
Theorem 2 Any rate R = k/n convolutional code C is the image of F n ((D)) under a transfer
function matrix that is
• a generator
• delay free
• polynomial
3
Proof: We already know from the previous theorem that C is the image of F k ((D)) under a
G(D) ∈ F (D)k×n that is realizable and delay free. Write
G(D) = [gij (D)]k×n
where gij (D) = pij (D)/qij (D), qij (0) = 1, 1 ≤ i ≤ k, 1 ≤ j ≤ n, and there is at least one (i, j) with
pij (0) = 1. Let q(D) = lcm({qij (D) : 1 ≤ i ≤ k, 1 ≤ j ≤ n}). Let
Ĝ(D) = q(D)G(D).
Then C is the image of F k ((D)) under Ĝ(D). Ĝ(D) is a polynomial matrix and thus realizable. Since
all the qij (0) = 1, q(0) = 1. So for the same (i, j) for which pij (0) = 1 will have q(0)pij (0)/qij (0) = 1
making Ĝ(D) delay free.
Definition 13 A rate R = k/n convolutional mapping is said to be systematic if some k of the code
sequences are exactly the k input sequences. Equivalently, after reordering the output coordinates
the corresponding transfer function matrix G(D) has the form
G(D) = [Ik×k R(D)] ,
where R(D) ∈ F (D)k×n−k .
Every convolutional code has both systematic and non-systematic convolutional mappings that
result in the same code.
In block codes errors cannot propagate very far because of finite size blocks. This is not
necessarily the case for convolutional codes as it is possible for a “finite” error in the code sequence
to have an “infinite” error in the corresponding input sequence.
Definition 14 A convolutional mapping is said to be catastrophic if there is some code sequence

y(D) with finitely many 1s that results from an input sequence x(D) with infinitely many 1s.
Every code has catastrophic and non-catastrophic mappings that result in that code.
Several examples relating to the definitions were discussed in class. See also the text of Lin and
Costello and the book of Johannesson and Zigangirov.
2 Smith form of a polynomial matrix

G(D), a k × n polynomial matrix, can be written as
G(D) = A(D)Γ(D)B(D)
where A(D) is an unimodular k × k polynomial matrix (an unimodular matrix is a polynomial

matrix with a polynomial inverse), B(D) is an unimodular n × n matrix, and
 
γ1 (D)
 .. 
 . 0 
Γ(D) = 


, r = rank G(D)
 γr (D) 
0 0
4
with γi (D) | γi+1 (D), 0 ≤ i ≤ r − 1. In fact
∆i (D)
γi (D) =
∆i−1 (D)
where ∆0 (D) = 1 and ∆i (D) = gcd of the i × i minors of G(D).

This was proved in class. See also the book of Johannesson and Zigangirov.
3 Smith form of a rational matrix G(D) ∈ F (D)k×n

Suppose q(D) is the lcm of the denominator polynomials in the entries of G(D). Then q(D)G(D)
is a polynomial matrix. So
q(D)G(D) = A(D)Γ̂(D)B(D) (Smith Form)
Therefore,
G(D) = A(D)Γ(D)B(D)
where A(D) and B(D) are unimodular with size k × k and n × n, respectively, and
 α1 (D)

 β1 (D) 
 .. 0 

Γ(D) =  . 

 αr (D) 
 βr (D) 
0 0
where αi (D) | αi+1 (D), 0 ≤ i ≤ r − 1 and βi+1 (D) | βi (D), 0 ≤ i ≤ r − 1.
4 Massey-Sain characterization of non-catastrophic generator ma-

trices
Theorem 1 A rational generator matrix G(D) for a convolutional code C is non-catastrophic
(Recall G(D) k × n has rank k and k ≤ n) iff αk (D) = D s for some s ≥ 0.
Proof: Let G(D) = A(D)Γ(D)B(D) (Smith form) and

h i
αi (D)
Γ(D) = diag βi (D) , 1 ≤ i ≤ k, 0
Assume αk (D) = D s . A right inverse for G(D) is given by

" #
βi (D)
−1 −1 diag , 1≤i≤k
G(D) =B (D) αi (D) A−1 (D)
0
Since α1 (D) | α2 (D) | . . . | αk (D) = D s , each αi (D) = D si for for some s1 ≤ s2 ≤ . . . ≤ sk = s.

So " #
diag (D s−si βi (D), 1 ≤ i ≤ k)
Ds G(D)−1 = B −1 (D) A−1 (D)
0
5
This is a polynomial matrix.

So if y(D) = x(D)G(D), then y(D)D s G(D)−1 = D s x(D). Because D s G(D)−1 is a polynomial
matrix we see that if y(D) is polynomial, then D s x(D) is polynomial. Therefore, x(D) corresponds
to a sequence that has only finitely many 1’s whenever y(D) corresponds to a sequence that has
only finitely many 1’s. This means that G(D) is non-catastrophic
Conversely, if αk (D) has a factor which is not just D, then αβkk(D)
(D) has infinite weight.
Now h i
(D)
G(D) = A(D) diag αβii(D) , 1 ≤ i ≤ k, 0 B(D)
so taking
βk (D)
x(D) = 0 . . . 0 A−1 (D)
αk (D)
first note that it corresponds to a sequence with infinitely many 10 s. Also
 
x(D)G(D) = 0 . . . 1 0| .{z
. . 0} B(D)
n−k
is a polynomial, so it corresponds to a sequence with finitely many 1’s. Thus we have found an
input sequence with infinitely many 1’s that is encoded as an output sequence with finitely many
1’s. This means that G(D) is catastrophic.
Definition 15 A generator matrix G(D) is called basic if it is polynomial and has a polynomial
right inverse.
Note: Any basic generator matrix is an (basic) encoding matrix. (A generator matrix G(D) is
called an encoding matrix if G(0) is invertible.)
Theorem 2 Every convolutional code C can be described by a basic generator matrix.
One consequence is that decoding can be conceptually viewed as in the figure, when a basic generator
matrix is used for encoding. Namely, one can view the decoding problem in two stages as that of
first finding the most likely output sequence that explains the (noisy) observations and then using
the (feedforward) inverse of the basic encoding matrix to recover the corresponding information
sequence.
Proof: We already know that C can be described by a polynomial delay free matrix. Call it
G(D). Let G(D) = A(D)Γ(D)B(D) be the Smith form of G(D), where
Γ(D) = [diag(γi (D)), 1 ≤ i ≤ k, 0] .
Note that
G(D) = A(D) [diag(γi (D)), 1 ≤ i ≤ k] Ĝ(D)
where Ĝ(D) is the k × n matrix consisting of the k rows of B(D). Also A(D) [diag(γi (D))] is
invertible as a rational matrix. So Ĝ(D) also describes C. Further Ĝ(D) is a polynomial matrix
6
e(D)
X(D) Y(D) Decode

G(D) Channel
for Y(D)
basic polynomial
generator matrix
Figure 1:
and the first k columns of B −1 (D) give a right inverse for Ĝ(D), and this is a polynomial right
inverse because B −1 (D) is polynomial.
Theorem 3 A polynomial generator matrix G(D) (a k × n matrix of rank k, k ≤ n) has a poly-

nomial right inverse iff γk (D) = 1 where
G(D) = A(D) [diag(γi (D)), 1 ≤ i ≤ k, 0] B(D)
is the Smith form of G(D).
Proof: See the book of Johannesson and Zigangirov.
Corollary: For any convolutional code C, any basic (i.e. polynomial with polynomial inverse)
generator matrix is non-catastrophic.
Proof : γk (D) = 1 = D 0 and use Massey-Sain, i.e. Theorem 1.
5 Some deeper results w/o proof

Consider any polynomial generator matrix for C. Call it G(D).
Def: Let
γi = max deg gij (D).
1≤j≤n
Call it the ith

constraint length.
m = max1≤i≤k γi is called the memory.
P
γ = ki=1 γi is called the overall constraint length.
The terminology comes from associating to the polynomial generator matrix a feedforward shift
register implementation (controller form) that encodes information sequences into code sequences.
7
The i-th coordinate of the information sequence (viewed as a block of k bits at any symbol interval)
will need to go into a shift register with γi blocks, the longest such shift register will have length
m, and the overall number of blocks in all the k shift registers involved will be γ.
Definition 16 A minimal basic encoding matrix is one whose overall constraint length is the small-
est among all basic encoding matrices for the same code.
Note: A convolutional code can have more than one basic encoding matrix describing it. In fact, if
a basic encoding matrix is multiplied on the left or the right by any unimodular matrix, this gives
another basic encoding matrix for the same code.
Now consider G(D), an arbitrary generator matrix for C.
" #
pij (D)
G(D) =
qij (D)
Define qi (D) = lcm(qij (D), 1 ≤ j ≤ n). Write
pij (D) peij (D)

=
qij (D) qi (D)
P
Define: γi = max1≤j≤n (deg qi (D), deg peij (D)) and m = max1≤i≤k γi and γ = ki=1 γi
γi is called the i-th constraint length of the generator matrix, m is called the memory, and γ is
called the overall constraint length.
The terminology comes from associating to the generator matrix a feedback shift register im-
plementation (controller form) that encodes information sequences into code sequences. The i-th
coordinate of the information sequence (viewed as a block of k bits at any symbol interval) will
need to go into a shift register with γi blocks, the longest such shift register will have length m,
and the overall number of blocks in all the k shift registers involved will be γ.
Definition 17 A minimal generator matrix for C is one whose overall constraint length γ is small-
est among all generator matrices describing C.
Theorem :
(I) A basic encoding matrix G(D) is minimal iff the 0-1 matrix of the highest degree terms in
each row is full rank. For example
" # " #
1 + D D2 D2 + D 1 0 1 1 0
G(D) = =⇒
D D3 D2 1 + D3 0 1 0 1
and this is a full rank matrix, so G(D) is a minimal basic encoding matrix for the code that
it describes.
(II) Every minimal basic encoding matrix is a minimal generator matrix.
(III) Any two minimal generator matrices have exactly the same constraint lengths up to reorder-
ing.
8
(IV) Any systematic generator matrix is a minimal generator matrix.
(V) Any minimal generator matrix is non-catastrophic.
For proofs of these results, see the book of Johannesson and Zigangirov.
Note : Every convolutional code has a systematic generator matrix (this is just using the
assumption that any generator matrix for the code has full rank), but there are convolutional codes
that do not have systematic polynomial generator matrices. For instance, the rate 1/2 code with
generator matrix
G(D) = [1 + D 1 + D + D 2 ] .
6 Duality for Convolutional Codes

Let G(D) be a (rational) generator matrix for C with Smith form G(D) = A(D)Γ(D)B(D) where
" #
Ĝ(D) h i
B(D) = and B −1 (D) = Ĝ(D)−1 H T (D)
(H T (D))−1
where Ĝ(D)−1 is a right inverse of G(D) and (H T (D))−1 is a right inverse for H T (D). The
convolutional code generated by H(D) is called the dual code of C.
The connection with the block code notion of duality was described in class. See also the book
of Johannesson and Zigangirov.
9
Chapter 2. Convolutional Codes

This chapter describes the encoder and decoder structures for convolutional codes.
The encoder will be represented in many different but equivalent ways. Also, the main
decoding strategy for convolutional codes, based on the Viterbi Algorithm, will be
described. A firm understanding of convolutional codes is an important prerequisite to
the understanding of turbo codes.
2.1 Encoder Structure

A convolutional code introduces redundant bits into the data stream through the
use of linear shift registers as shown in Figure 2.1.
x(1) c(1)
D D D +
c(2)
+
x(2) c(3)
D D +
Figure 2.1: Example convolutional encoder where x(i) is an input information bit
stream and c(i) is an output encoded bit stream [Wic95].
The information bits are input into shift registers and the output encoded bits are obtained
by modulo-2 addition of the input information bits and the contents of the shift registers.
The connections to the modulo-2 adders were developed heuristically with no algebraic or
combinatorial foundation.
The code rate r for a convolutional code is defined as

k
r= (2.1)
n
Fu-hua Huang Chapter 2. Convolutional Codes 4

where k is the number of parallel input information bits and n is the number of parallel
output encoded bits at one time interval. The constraint length K for a convolutional code
is defined as
K = m+1 (2.2)
where m is the maximum number of stages (memory size) in any shift register. The shift
registers store the state information of the convolutional encoder and the constraint length
relates the number of bits upon which the output depends. For the convolutional encoder
shown in Figure 2.1, the code rate r=2/3, the maximum memory size m=3, and the
constraint length K=4.
A convolutional code can become very complicated with various code rates and
constraint lengths. As a result, a simple convolutional code will be used to describe the
code properties as shown in Figure 2.2.
c(1)
x(1)
D D
c(2)
Figure 2.2: Convolutional encoder with k=1, n=2, r=1/2, m=2, and K=3.
2.2 Encoder Representations

The encoder can be represented in several different but equivalent ways. They are
1. Generator Representation
2. Tree Diagram Representation
3. State Diagram Representation
4. Trellis Diagram Representation

2.2.1 Generator Representation

Generator representation shows the hardware connection of the shift register taps
to the modulo-2 adders. A generator vector represents the position of the taps for an
output. A “1” represents a connection and a “0” represents no connection. For example,
the two generator vectors for the encoder in Figure 2.2 are g1 = [111] and g2 = [101]
where the subscripts 1 and 2 denote the corresponding output terminals.
2.2.2 Tree Diagram Representation

The tree diagram representation shows all possible information and encoded
sequences for the convolutional encoder. Figure 2.3 shows the tree diagram for the
encoder in Figure 2.2 for four input bit intervals.
0 0
0 0
0 0 1 1
1 0
1 1
0 1
0 0 0
1 1
1 0
0 0
1 1
0 1
0 1
1 0
0 0
1 1
1 1
1 0
1 0
0 0
0 1
1 1
1 1
1 0 1
0 0
0 1
0 1
1 0
1 0
t= 0 t= 1 t= 2 t= 3 t= 4
Figure 2.3: Tree diagram representation of the encoder in Figure 2.2 for four input
bit intervals.
In the tree diagram, a solid line represents input information bit 0 and a dashed line
represents input information bit 1. The corresponding output encoded bits are shown on
the branches of the tree. An input information sequence defines a specific path through
the tree diagram from left to right. For example, the input information sequence

x={1011} produces the output encoded sequence c={11, 10, 00, 01}. Each input
information bit corresponds to branching either upward (for input information bit 0) or
downward (for input information bit 1) at a tree node.
2.2.3 State Diagram Representation

The state diagram shows the state information of a convolutional encoder. The
state information of a convolutional encoder is stored in the shift registers. Figure 2.4
shows the state diagram of the encoder in Figure 2.2.
10
1/11 1/01
0/00 0/10
00 11
1/00 1/10
0/11 0/01
01
Figure 2.4: State diagram representation of the encoder in Figure 2.2.
In the state diagram, the state information of the encoder is shown in the circles. Each
new input information bit causes a transition from one state to another. The path
information between the states, denoted as x/c, represents input information bit x and
output encoded bits c. It is customary to begin convolutional encoding from the all zero
state. For example, the input information sequence x={1011} (begin from the all zero
state) leads to the state transition sequence s={10, 01, 10, 11} and produces the output
encoded sequence c={11, 10, 00, 01}. Figure 2.5 shows the path taken through the state
diagram for the given example.

10
1/11 1/01
0/00 0/10
00 11
1/00 1/10
0/11 0/01
01
Figure 2.5: The state transitions (path) for input information sequence {1011}.
2.2.4 Trellis Diagram Representation

The trellis diagram is basically a redrawing of the state diagram. It shows all
possible state transitions at each time step. Frequently, a legend accompanies the trellis
diagram to show the state transitions and the corresponding input and output bit mappings
(x/c). This compact representation is very helpful for decoding convolutional codes as
discussed later. Figure 2.6 shows the trellis diagram for the encoder in Figure 2.2.

LEGEND
States States TRELLIS DIAGRAM
00 0/00 00 States
1/11 0/11 00
01 01
1/00 01
0/10
10 10
1/01 0/01
10
11 11
1/10 11
i i+1 0 1 2 3 4 Time
Time
Figure 2.6: Trellis diagram representation of the encoder in Figure 2.2 for four
input bit intervals.
Figure 2.7 shows the trellis path for the state transitions in Figure 2.5.
States TRELLIS DIAGRAM
00
01 1/11
0/10
1/00
10
1/01
11
0 1 2 3 4 Time
Figure 2.7: Trellis path for the state transitions in Figure 2.5.

2.3 Catastrophic Convolutional Code

Catastrophic convolutional code causes a large number of bit errors when only a
small number of channel bit errors is received. This type of code needs to be avoided and
can be identified by the state diagram. A state diagram having a loop in which a nonzero
information sequence corresponds to an all-zero output sequence identifies a catastrophic
convolutional code. Figure 2.8 shows two examples of such code.
Sj
1/00
1/00
Si 0/00
Si
1/00
Sk
Figure 2.8: Examples of catastrophic convolutional code.
2.4 Hard-Decision and Soft-Decision Decoding

Hard-decision and soft-decision decoding refer to the type of quantization used on
the received bits. Hard-decision decoding uses 1-bit quantization on the received channel
values. Soft-decision decoding uses multi-bit quantization on the received channel
values. For the ideal soft-decision decoding (infinite-bit quantization), the received
channel values are directly used in the channel decoder. Figure 2.9 shows hard- and soft-
decision decoding.

BPSK
x Convolutional c Modulator
Encoder c=0 -> send -1
c=1 -> send +1
Noise Channel
r Soft-Decision
y Convolutional BPSK
Decoder Hard-Decision Demodulator
r=rout rin
rin<=0 -> rout=0
rin>0 -> rout=1
Figure 2.9: Hard- and Soft-decision decoding [Woe94].
2.5 Hard-Decision Viterbi Algorithm

For a convolutional code, the input sequence x is “convoluted” to the encoded
sequence c. Sequence c is transmitted across a noisy channel and the received sequence r
is obtained. The Viterbi algorithm computes a maximum likelihood (ML) estimate on the
estimated code sequence y from the received sequence r such that it maximizes the
probability p(r|y) that sequence r is received conditioned on the estimated code sequence
y. Sequence y must be one of the allowable code sequences and cannot be any arbitrary
sequence. Figure 2.10 shows the described system structure.
x Convolutional c r Viterbi y
Channel
Encoder Decoder
Noise
Figure 2.10: Convolutional code system.
For a rate r convolutional code, the encoder inputs k bits in parallel and outputs n
bits in parallel at each time step. The input sequence is denoted as
x=(x0(1), x0(2), ..., x0(k), x1(1), ..., x1(k), xL+m-1(1), ..., xL+m-1(k)) (2.3)
and the coded sequence is denoted as
c=(c0(1), c0(2), ..., c0(n), c1(1), ..., c1(n), cL+m-1(1), ..., cL+m-1(n)) (2.4)
where L denotes the length of input information sequence and m denotes the maximum
length of the shift registers. Additional m zero bits are required at the tail of the

information sequence to take the convolutional encoder back to the all-zero state. It is
required that the encoder start and end at the all-zero state. The subscript denotes the
time index while the superscript denotes the bit within a particular input k-bit or output n-
bit block. The received and estimated sequences r and y can be described similarly as
r=( r0(1), r0(2), ..., r0(n), r1(1), ..., r1(n), rL+m-1(1), ..., rL+m-1(n)) (2.5)
and
y=(y0(1), y0(2), ..., y0(n), y1(1), ..., y1(n), yL+m-1(1), ..., yL+m-1(n)). (2.6)
For ML decoding, the Viterbi algorithm selects y to maximize p(r|y). The

channel is assumed to be memoryless, and thus the noise process affecting a received bit
is independent from the noise process affecting all of the other received bits [Wic95].
From probability theory, the probability of joint, independent events is equivalent to the
product of the probabilities of the individual events. Thus,
L + m −1
p(r| y) = ∏ [ p(r
i =0
i
(1)
| yi(1) ) p(ri( 2 ) | yi( 2 ) ) p(ri( n ) | yi( n ) )] [Wic95] (2.7)
L + m −1  n 
= ∏  ∏ p(ri( j) | yi( j) ) [Wic95] (2.8)
i = 0  j=1 
This equation is called the likelihood function of y given that r is received [Vit71]. The
estimate that maximizes p(r|y) also maximizes log p(r|y) because logarithms are
monotonically increasing functions. Thus, a log likelihood function can be defined as
L + m −1  n 
log p(r| y) = ∑  ∑ log p(ri( j) | yi( j) ) [Wic95] (2.9)
i = 0  j=1 
For an easier manipulation of the summations over the log function, a bit metric is
defined. The bit metric is defined as
M (ri ( j ) | yi( j ) ) = a[log p(ri ( j ) | yi( j ) ) + b] [Wic95] (2.10)
where a and b are chosen such that the bit metric is a small positive integer [Wic95].
The values a and b are defined for binary symmetric channel (BSC) or hard-decision
decoding. Figure 2.11 shows a BSC.

1-p
0 0
p
Transmitted Received
Bit p Bit
1-p
1 1
Figure 2.11: The binary symmetric channel model, where p is the crossover
probability.
For BSC, a and b can be chosen in two distinct ways. For the conventional way,
they can be chosen as
1
a= [Wic95] (2.11)
log p − log(1 − p)
and
b = − log(1 − p) [Wic95] (2.12)
The resulting bit metric is then
1
M (ri ( j ) | yi( j ) ) = [log p(ri ( j ) | yi( j ) ) − log(1 − p)] (2.13)
[log p − log(1 − p)]
From the BSC model, it is clear that p(ri ( j ) | yi( j ) ) can only take on values p and 1-p.
Table 2.1 shows the resulting bit metric.
Table 2.1: Conventional Bit Metric Values

M (ri ( j ) | yi( j ) ) Received Bit Received Bit
ri ( j ) = 0 ri ( j ) = 1
Decoded Bit 0 1
yi( j ) = 0
Decoded Bit 1 0
yi( j ) = 1
This bit metric shows the cost of receiving and decoding bits. For example, if the
decoded bit yi( j ) = 0 and the received bit ri ( j ) = 0 , then the cost M (ri ( j ) | yi( j ) ) = 0.
However, if the decoded bit yi( j ) = 0 and the received bit ri ( j ) = 1, then the cost
M (ri ( j ) | yi( j ) ) = 1. As it can be seen, this is related to the Hamming distance and is
known as the Hamming distance metric. Thus, the Viterbi algorithm chooses the code

sequence y through the trellis that has the smallest cost/Hamming distance relative to the
received sequence r.
Alternatively, a and b can be chosen as

1
a= [Wic95] (2.14)
log(1 − p) − log p
and
b = − log p [Wic95] (2.15)
The resulting alternative bit metric is then
1
M (ri ( j ) | yi( j ) ) = [log p(ri ( j ) | yi( j ) ) − log p] (2.16)
[log(1 − p) − log p]
Table 2.2 shows the resulting alternative bit metric.
Table 2.2: Alternative Bit Metric Values

( j) ( j)
M (ri | y i ) Received Bit Received Bit
ri ( j ) = 0 ri ( j ) = 1
Decoded Bit 1 0
yi( j ) = 0
Decoded Bit 0 1
yi( j ) = 1
For this case, the Viterbi algorithm chooses the code sequence y through the trellis that
has the largest cost/Hamming distance relative to the received sequence r. Furthermore,
for an arbitrary channel (not necessarily BSC), the values a and b are found on a trial-and-
error basis to obtain an acceptable bit metric.
From the bit metric, a path metric is defined. The path metric is defined as
L + m −1  n 
M(r| y) = ∑  ∑ M(ri( j) | yi( j) ) [Wic95] (2.17)
i = 0  j=1 
and indicates the total cost of estimating the received bit sequence r with the decoded bit
sequence y in the trellis diagram. Furthermore, the kth branch metric is defined as
n
M (rk | y k ) = ∑ M (rk( j ) | y k( j ) ) [Wic95] (2.18)
j =1
and the kth partial path metric is defined as

k
M k (r| y) = ∑ M(ri | y i ) [Wic95] (2.19)
i =0
k  n 
= ∑  ∑ M(ri( j) | yi( j) ) [Wic95] (2.20)
i = 0  j=1 
The kth branch metric indicates the cost of choosing a branch from the trellis diagram.
The kth partial path metric indicates the cost of choosing a partially decoded bit sequence
y up to time index k.
The Viterbi algorithm utilizes the trellis diagram to compute the path metrics.
Each state (node) in the trellis diagram is assigned a value, the partial path metric. The
partial path metric is determined from state s = 0 at time t = 0 to a particular state s = k at
time t ≥ 0. At each state, the “best” partial path metric is chosen from the paths
terminated at that state [Wic95]. The “best” partial path metric may be either the larger
or smaller metric, depending whether a and b are chosen conventionally or alternatively.
The selected metric represents the survivor path and the remaining metrics represent the
nonsurvivor paths. The survivor paths are stored while the nonsurvivor paths are
discarded in the trellis diagram. The Viterbi algorithm selects the single survivor path
left at the end of the process as the ML path. Trace-back of the ML path on the trellis
diagram would then provide the ML decoded sequence.
The hard-decision Viterbi algorithm (HDVA) can be implemented as follows

[Rap96], [Wic95]:
Sk,t is the state in the trellis diagram that corresponds to state Sk at time t. Every
state in the trellis is assigned a value denoted V(Sk,t).
1. (a) Initialize time t = 0.
(b) Initialize V(S0,0) = 0 and all other V(Sk,t) = +∞.
2. (a) Set time t = t+1.
(b) Compute the partial path metrics for all paths going to state Sk at time t.
n
First, find the tth branch metric M (rt | y t ) = ∑ M (rt( j ) | y t( j ) ) . This is calculated
j =1
n
from the Hamming distance ∑r t
( j)
− y t( j ) . Second, compute the tth partial path
j =1
t
metric Mt (r| y) = ∑ M(ri | y i ) . This is calculated from V(S k,t-1 ) + M (rt | y t ) .
i =0
3. (a) Set V(Sk,t) to the “best” partial path metric going to state Sk at time t.
Conventionally, the “best” partial path metric is the partial path metric with the
smallest value.
(b) If there is a tie for the “best” partial path metric, then any one of the tied
partial path metric may be chosen.
4. Store the “best” partial path metric and its associated survivor bit and state paths.
5. If t < L+m-1, return to Step 2.
The result of the Viterbi algorithm is a unique trellis path that corresponds to the ML
codeword.

A simple HDVA decoding example is shown below. The convolutional encoder

used is shown in Figure 2.2. The input sequence is x={1010100}, where the last two bits
are used to return the encoder to the all-zero state. The coded sequence is c={11, 10, 00,
10, 00, 10, 11}. However, the received sequence r={10, 10, 00, 10, 00, 10, 11} has a bit
error (underlined). Figure 2.12 shows the state transition diagram (trellis legend) of the
example convolutional encoder.
LEGEND
S ta te s S ta te s
0 /0 0
0 0 0 0
1 /1 1 0 /1 1
0 1 0 1
1 /0 0
0 /1 0
1 0 1 0
1 /0 1 0 /0 1
1 1 1 1
1 /1 0
i i+ 1
T im e
Figure 2.12: The state transition diagram (trellis legend) of the example
convolutional encoder.
The state transition diagram shows the estimated information and coded bits along the
branches (needed for the decoding process). HDVA decoding chooses the ML path
through the trellis as shown in Figure 2.13. The chosen partial path (accumulated) metric
for this example is the smallest Hamming distance and are shown in the figure for every
node. The bold partial path metrics correspond to the ML path. Survivor paths are
represented by bold solid lines and competing paths are represented by simple solid lines.
For metric “ties”, the first branch is always chosen.

TRELLIS DIAGRAM
States
0 1 2 2 3 3 4 1
00
TIE
01 1 3 1 TIE 4 1
10
1 2 1 3 1
3 3 3 4
11 TIE TIE
0 1 2 3 4 5 6 7
Time
Figure 2.13: HDVA decoding of the example.
From the trellis diagram in Figure 2.13, the estimated code sequence is y={11, 10, 00, 10,
00, 10, 11} which is the code sequence c. Utilizing the state transition diagram in
Figure 2.12, the estimated information sequence is x’={1010100}.
2.6 Soft-Decision Viterbi Algorithm

There are two general methods of implementing a soft-decision Viterbi algorithm.
The first method (Method 1) uses Euclidean distance metric instead of Hamming distance
metric. The received bits used in the Euclidean distance metric are processed by multi-bit
quantization. The second method (Method 2) uses a correlation metric where its received
bits used in this metric are also processed by multi-bit quantization.
2.6.1 Soft-Decision Viterbi Algorithm (Method 1)

In soft-decision decoding, the receiver does not assign a zero or a one (single-bit
quantization) to each received bit but uses multi-bit or infinite-bit quantized values
[Wic95]. Ideally, the received sequence r is infinite-bit quantized and is used directly in
the soft-decision Viterbi decoder. The soft-decision Viterbi algorithm is similar to its
hard-decision algorithm except that squared Euclidean distance is used in the metric
instead of Hamming distance.

The soft-decision Viterbi algorithm (SDVA1) can be implemented as follows:

(b) Initialize V(S0,0) = 0 and all other V(Sk,t) = +∞.
(b) Compute the partial path metrics for all paths going to state Sk at time t. First,
n
find the tth branch metric M (rt | y t ) = ∑ M (rt( j ) | y t( j ) ) . This is calculated from
j =1
n
the squared Euclidean distance ∑ (r t
( j)
− y t( j ) ) 2 . Second, compute the tth partial
j =1
t
path metric Mt (r| y) = ∑ M(ri | y i ) . This is calculated from V(S k,t-1 ) + M (rt | y t ) .
i =0
3. (a) Set V(Sk,t) to the “best” partial path metric going to state Sk at time t.
Conventionally, the “best” partial path metric is the partial path metric with the
smallest value.
2.6.2 Soft-Decision Viterbi Algorithm (Method 2)

The second soft-decision Viterbi algorithm (SDVA2) is developed below. The
likelihood function is represented by a Gaussian probability density function
1
e − ( ri − yi Eb ) / N o
( j) ( j) 2
p(ri ( j ) | yi( j ) ) = (2.21)
πN o
where Eb is the received energy per code-word bit and No is the one-sided noise spectral
density [Wic95]. The received bit is a Gaussian random variable with mean yi( j ) E b
and variance No/2. The log likelihood function can be defined as [Wic95]
L + m −1  n 
log p(r| y) = ∑  ∑ log p(ri( j) | yi( j) ) (2.22)
i = 0  j=1 
L + m −1  n  (ri( j) − yi( j) Eb ) 2  
= ∑ ∑ − − log πNo   (2.23)
i =0  j=1  No  
−1 L+ m−1  n ( j)  ( L + m)n
= ∑ ∑ (ri − yi Eb )  −
No i = 0  j=1
( j) 2
2
log πNo (2.24)


−1 L+ m−1  n ( j) 2  ( L + m)n
= ∑ ∑ [ri − 2ri yi Eb + yi Eb ] −
No i = 0  j=1
( j) ( j) ( j) 2
2
log πNo (2.25)

where yi( j )2 = 1
L + m −1  n ( j) ( j) 
= C1 ∑0 ∑1 ri yi  + C 2 (2.26)
i=  j= 
where C1 and C2 are all terms not a function of y
= C1 (r • y) + C2 (2.27)
From this, it is seen that the bit metric can be defined as
M (ri ( j ) | yi( j ) ) = ri ( j ) yi( j ) (2.28)
The soft-decision Viterbi algorithm (SDVA2) can be implemented as follows:
(b) Initial V(S0,0) = 0 and all other V(Sk,t) = -∞.
(b) Compute the partial path metrics for all paths going to state Sk at time t.
n
First, find the tth branch metric M (rt | y t ) = ∑ M (rt( j ) | y t( j ) ) . This is calculated
j =1
n
from the correlation of ri ( j ) and yi( j ) , ∑r
j =1
i
( j)
yi( j ) . Second, compute the tth partial
t
path metric Mt (r| y) = ∑ M(ri | y i ) . This is calculated from
i =0
V(S k,t-1 ) + M (rt | y t ) .
3. (a) Set V(Sk,t) to the “best” partial path metric going to state Sk at time t. The
“best” partial path metric is the partial path metric with the largest value.
Generally with soft-decision decoding, approximately 2 dB of coding gain over hard-
decision decoding can be obtained in Gaussian channels.
2.7 Performance Analysis of Convolutional Code

The performance of convolutional codes can be quantified through analytical
means or by computer simulation. The analytical approach is based on the transfer
function of the convolutional code which is obtained from the state diagram. The process

of obtaining the transfer function and other related performance measures are described
below.
2.7.1 Transfer Function of Convolutional Code

The analysis of convolutional codes is generally difficult to perform because
traditional algebraic and combinatorial techniques cannot be applied. These heuristically
constructed codes can be analyzed through their transfer functions. By utilizing the state
diagram, the transfer function can be obtained. With the transfer function, code
properties such as distance properties and the error rate performance can be easily
calculated. To obtain the transfer function, the following rules are applied:
1. Break the all-zero (initial) state of the state diagram into a start state and an end
state. This will be called the modified state diagram.
2. For every branch of the modified state diagram, assign the symbol D with its
exponent equal to the Hamming weight of the output bits.
3. For every branch of the modified state diagram, assign the symbol J.
4. Assign the symbol N to the branch of the modified state diagram, if the branch
transition is caused by an input bit 1.
For the state diagram in Figure 2.4, the modified state diagram is shown in Figure 2.14.
1/00=NJ
1/11=NJD2 0/11=JD 2
Sa 00 Sb 10 Sc 01 Se 00
0/10=JD
1/01=NJD 0/01=JD
Sd 11
1/10=NJD
Figure 2.14: The modified state diagram of Figure 2.4 where Sa is the start state and
Se is the end state.
Nodal equations are obtained for all the states except for the start state in Figure 2.14.
These results are
Sb = NJD 2 S a + NJS c
S c = JDS b + JDS d
S d = NJDSb + NJDS d
S e = JD 2 S c

The transfer function is defined to be

S ( D, N, J)
T ( D, N, J) = end (2.29)
S start ( D, N, J)
and for Figure 2.14,
S
T ( D, N , J ) = e
Sa
By substituting and rearranging,
NJ 3 D 5
T ( D, N , J ) = (closed form)
1 − ( NJ + NJ 2 ) D
= NJ 3 D 5 + ( N 2 J 4 + N 2 J 5 ) D 6 + ( N 3 J 5 + 2 N 3 J 6 + N 3 J 7 ) D 7 +
(expanded polynomial form)
2.7.1.1 Distance Properties

The free distance between a pair of convolutional codewords is the Hamming
distance between the pair of codewords. The minimum free distance, dfree, is the
minimum Hamming distance between all pairs of complete convolutional codewords and
is defined as
d free = min{d ( y 1 , y 2 )| y 1 ≠ y 2 } [Wic95] (2.30)
= min{w( y)| y ≠ 0} [Wic95] (2.31)
where d(•,•) is the Hamming distance between a pair of convolutional codewords and
w(•) is the Hamming distance between a convolutional codeword and the all-zero
codeword (the weight of the codeword). The minimum free distance corresponds to the
ability of the convolutional code to estimate the best decoded bit sequence. As dfree
increases, the performance of the convolutional code also increases. This characteristic is
similar to the minimum distance for block codes. From the transfer function, the
minimum free distance is identified as the lowest exponent of D. From the above transfer
function for Figure 2.14, dfree = 5. Also, if N and J are set to 1, the coefficients of Di’s
represent the number of paths through the trellis with weight Di. More information about
the codeword is obtained from observing the exponents of N and J. For a codeword, the
exponent of N indicates the number of 1s in the input sequence, and the exponent of J
indicates the length of the path that merges with the all-zero path for the first time
[Pro95].
2.7.1.2 Error Probabilities

There are two error probabilities associated with convolutional codes, namely first
event and bit error probabilities. The first event error probability, Pe, is the probability
that an error begins at a particular time. The bit error probability, Pb, is the average

number of bit errors in the decoded sequence. Usually, these error probabilities are
defined using the Chernoff Bounds and are derived in [Pro95], [Rhe89], [Wic95].
For hard-decision decoding, the first event error and bit error probabilities are
defined as
Pe < T ( D, N , J )| D = 4 p (1− p ) , N =1, J =1 (2.32)
and
dT ( D, N , J )
Pb < (2.33)
dN D = 4 p (1− p ) , N =1, J =1
where
 2rE b 
p = Q  (2.34)
 No 
and
∞
1
Q ( x) = ∫ e − u / 2 du
2
(2.35)
x 2π
For soft-decision decoding, the first event error and bit error probabilities are
defined as
Pe < T ( D, N , J )| D = e − rEb / N o , N =1, J =1 (2.36)
and
dT ( D, N , J )
Pb < (2.37)
dN D = e − rE b / N o , N =1, J =1
Two other factors also determine the performance of the Viterbi decoder. They
are commonly referred to as the decoding depth and the degree of quantization of the
received signal.
2.7.2 Decoding Depth

The decoding depth is a window in time that makes a decision on the bits at the
beginning of the window and accepts bits at the end of the window for metric
computations. This scheme gives up the optimum ML decoding at the expense of using
less memory and smaller decoding delay. It has been experimentally found that if the
decoding depth is 5 times greater than the constraint length K then the error introduced by
the decoding depth is negligible [Pro95].

2.7.3 Degree of Quantization

For soft-decision Viterbi decoding, the degree of the quantization on the received
signal can affect the decoder performance. The performance of the Viterbi decoder
improves with higher bit quantization. It has been found that an eight-level quantizer
degrades the performance only slightly with respect to the infinite bit quantized case
[Wic95].
2.7.4 Decoding Complexity for Convolutional Codes

For a general convolutional code, the input information sequence contains k*L
bits where k is the number of parallel information bits at one time interval and L is the
number of time intervals. This results in L+m stages in the trellis diagram. There are
exactly 2k*L distinct paths in the trellis diagram, and as a result, an exhaustive search for
the ML sequence would have a computational complexity on the order of O[2k*L]. The
Viterbi algorithm reduces this complexity by performing the ML search one stage at a
time in the trellis. At each node (state) of the trellis, there are 2k calculations. The
number of nodes per stage in the trellis is 2m. Therefore, the complexity of the Viterbi
algorithm is on the order of O[(2k)(2m)(L+m)]. This significantly reduces the number of
calculations required to implement the ML decoding because the number of time intervals
L is now a linear factor and not an exponent factor in the complexity. However, there
will be an exponential increase in complexity if either k or m increases.

Laboratory 97.478 Fall 2001
Convolution Codes
1.0 Prologue:
Convolutional codes, why should complicate our lives with them
People use to send voice waveforms in electrical form over a twisted pair of wires. These tel-
ephone voice signals had a bandwidth of 4KHz. If the channel polluted the signal with a bit of
noise, the only thing that happened was that the conversation got a bit noisier. As technology de-
veloped, we digitized the voice signals at 8000 samples per second (twice the highest frequency to
prevent aliasing) and transmitted the bits. If noise corrupted a few bits, the corresponding sample
value(s) would be slightly wrong or very wrong depending on whether the bad bits were near the
most-significant-bit or least-significant-bit. The conversation sounded noisier, but were still dis-
cernible. Someone saying “cat” will not be thought to have said “dog,” and probably would not
even be thought to have said “caught.”
When people started to send data files rather than voice, corrupted bits became more impor-
tant. Even one wrong bit could prevent a program from running properly. Say the noise in a chan-
nel was low enough for the probability of a bad bit to be 1x10-9 i.e. the chances of a bit being
correct is 0.999999999 (nine 9’s). The chances of 1000 bits being all correct is 0.999999 (six 9’s)
and the chances of 106 bits being all correct is 0.999 (three 9’s). A 1 megabyte file (8x106 bits) has
almost a 1% chance of being corrupted. The reliability of the channel had to be improved.
The probability of error can be reduced by transmitting more bits than needed to represent
the information being sent, and convolving each bit with neighbouring bits so that if one transmit-
ted bit got corrupted, enough information is carried by the neighbouring bits to estimate what the
corrupted bit was. This approach of transforming a number of information bits into a larger
number of transmitted bits is called channel coding, and the particular approach of convolving the
bits to distribute the information is referred to as convolution coding. The ratio of information bits
to transmitted bits is the code rate (less than 1) and the number of information bits over which the
convolution takes place is the constraint length.
For example, suppose you channel encod-
ed a message using a convolution code. Suppose Convolution
00101101 channel
you transmitted 2 bits for every information bit encoder
a b
(code rate=0.5) and used a constraint length of 3.
Then the coder would send out 16 bits for every
11 10 00 01 01 00
8 bits of input, and each output pair would de-
pend on the present and the past 2 input bits
(constraint length =3). The output would come a 3 bits in the input stream generate 2 bits
out at twice the input speed. in the output stream.
b Take the most recent of these input bits
Since information about each input bit is plus one new input bit and generate the
spread out over 6 transmitted bits, one can usual- next 2 output bits.
ly reconstruct the correct input even with several Thus each input bit effects 6 output bits.
transmission errors. FIGURE 1
The need for coding is very important in the use of cellular phones. In this case, the “chan-
nel” is the propagation of radio waves between your cell phone and the base station. Just by turn-
ing your head while talking on the phone, you could suddenly block out a large portion of the
transmitted signal. If you tried to keep your head still, a passing bus could change the pattern of
Authors, Fred Ma, John Knight September 13, 2001 1

A property of MVG_OMALLOOR Convolution Codes
bouncing radio waves arriving at your phone so that they add destructively, again giving a poor
signal. In both cases, the SNR suddenly drops deeply and the bit error rate goes up dramatically.
So the cellular environment is extremely unreliable. If you didn’t have lots of redundancy in the
transmitted bits to boost reliability, chances are that digital cell phones would not be the success
they are today. As an example, the first digital cell system, Digital Advance Mobile Phone Service
(D-AMPS) used convolution coding of rate 1/2 (i.e. double the information bit rate) and constraint
length of 6. Current CDMA-based cell phones use spread-spectrum to combat the unreliably of the
air interface, but still use convolution coding of rate 1/2 in the downlink and 1/3 in the uplink (con-
straint length 9). What CDMA is, is not part of this lab. You can ask the TA if you are curious.
2.0 Example of Convolution Encoding

FIGURE 2
XOR Output z1
MUX Double
Shift register speed
Input information x(n) x(n-1) x(n-2) 0 output
Todays Yester- Day z
bit stream x days before 1
bit yester-
bit days bit
clock
The constraint length is 3. Output z2

The output is effected
(constrained) by 3 bits. XOR
The present input bit
and the two previous bits
stored in the shift register.
This is a convolution encoder of code rate 1/2 This means there are two output bits for each
input bit. Here the output bits are transmitted one after another, two per clock cycle.
The output z1 = x(n)⊕ x(n-1)⊕x(n-2).
Here x(n) is the present input bit, x(n-1) was the previous (yesterdays) bit, etc.
The output z2= x(n)⊕ x(n-2).
The input connections to the XORs can be written as binary vectors [1 1 1] and [1 0 1] are known
as the generating vectors or generating polynomials for the code.
Authors Fred Ma, John Knight September 13, 2001 2

2.1 The Encoder as a Finite-State Machine

The correlation encoder can be described as a Mealy
x=0/ z=00
machine. The state is the two bits in the shift register.
Let the first input bit to the shift register be x(n) = 1,
and let the flip-flops be reset to zero so x(n-1)=x(n-2)=0 . x=1 S0=00
Then:- z=11
State= 00 = S0 = [x(n-1),x(n-2)] x=0
Output z=[z1,z2] x=1/ z= z=11
00
z1 = x(n)⊕ x(n-1)⊕x(n-2) S1=10
= 1 ⊕ 0⊕ 0 =1 x=0/ z=
1 0
z2 = x(n)⊕ x(n-2). S2=01
= 1 ⊕ 0 =1 x=1
z=[z1,z2]= 11 z=01
After the clock, state bit x(n-1)=0 will shift right x=0
z=01
into x(n-2), the input x(n)=1 will shift right into x(n-1), x = x(n) S3=11
and the next state will be 10 = S1. state =
[x(n-1),x(n-2)]
2.2 The Trellis Encoding Diagram FIGURE 3 x=1/ z=10
To get the trellis diagram, squash the state diagram so

NOTATION TRELLIS
S0, S1, S2 and S3 are in a vertical line. This line represents
input/output
the possible states at time t=n (now). Make time the hori- x/z 0/00
S0 S0
zontal axis. Put in another line of states to show the possi- 1/1
1
ble states at t=n+1. 1 input
11
Then add the transitions to the diagram. Make the S1 0 S1
0/
0 input
them all go from states @ t=n to states @ t=n+1. Thus the 1 /0
self loop at state S0 in the state graph becomes the horizon- 0 /1
0
1/
tal line from S0@t=n to S0 @t=n+1 in the trellis diagram. S2 S2
01
The complete trellis diagram extends past t=1 to as FIGURE 4 1
0 /0
many time steps as are needed. 1/10
S3 S3
Suppose the encoder is reset to state S0=00, and the t=n t=n+1
input is 1,0,1,1,0,0. By following the trellis one sees that
the output is 11 10 00 01 01 11. Also it passes through
states S0, S1, S2, S1, S3, S2 ending in S0 @ t=6.
FIGURE 5
More Complete Trellis Diagram
0/00 0/00 0/00 0/00 0/00 0/00

S0 S0 S0 S0 S0 S0 S0
1/1 1 /1 1/1 1 /1 1 /1 1 /1
1 1 1 1 1 1
11
11
11
11
11
11
S1 0 S1 0 S1 S1 0 S1 0 S1 0 S1
0/
0/
0/
0/
0
0/
0/
1 /0 1 /0 1/0 1 /0 1 /0 1 /0
0 /1 0 /1 0 /1 0 /1 0/1 0 /1
0 0 0 0 0 0
1/
1/
1/
1/
1/
S2 S2 S2
1/
S2 S2 S2 S2
01
0
01
01
01
1
1 1 1 1 1
0/0 0 /0 0 /0 0 /0 0 /0 0 /0
1
S3 1/10 S3 1/10 S3 1/10 S3 1/10 S3 1/10 1/10
S3 S3
t=0 t=1 t=2 t=3 t=4 t=5 t=6

2.3 Lab and Problem Rules

The Convolution encoder/Viterbi decoder design problem will be done by a groups of three
persons.
The number of exercises is (usually) divisible by three so one person in each group can do
every third problem and thus do one-third of the exercises. The three are to be submitted together
with the name of the person doing each part attached to the part.
Five marks will be assigned for each persons questions and will be given to the person in
the group answering the question. Two person groups should take turns answering the odd ques-
tions which will be assigned three marks. This applys to temporary two person groups, groups
where one member goofs off. Another member can get three extra marks by doing his/her ques-
tions.
One common mark will be assigned to coordination within the group. Do they use common
symbols? Do they hand the assignments in at the same time attached together? Do they refer to the
other questions where appropriate? Violation of any one of the above may cost each group member
his/her common mark.
All members of the group are responsible for knowing how to do each exercise.
Related problems will appear on examinations.
The problems and labs have subtle, and also not so subtle, changes from last year. One way
to lose marks quickly is to submit answers taken from last year. The penalty will be zero for the
question(s) invoved and a 75% reduction in the mark of the whole group.
2.4 First Exercise: A Convolution Encoder.

1. Problem: Encoding a number
• Take the last 4 digits in your student (the least significant digits).
• Convert them to a hexadecimal number (Matlab has a function dec2hex).
• Convert the hexadecimal number to binary (12 to 16 bits).
• Use this as data for the encoder below. Feed in the least significant bit first. Also reset the
shift register to 00 before you start.
• Calculate the output bits and states when one encodes these bits using a code rate
1/2, constraint length 3 encoder with generating vectors [111] and [101].
Tabulate how the state and output values change with each clock cycle.
Clock cycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
input bit
shift reg (state) 00

Output [z1,z2]
2. Problem: Draw the circuit for an encoder which has:

a code rate =1/2,
constraint length of 4,
generating vectors [1101] and [1111], where the “0” means no con- x(n) x(n-1) x(n-2) x(n-3)
nection to x(n-2).
3. Problem: Draw the state graph for the above constraint length 4 encoder. Draw the first 3
time steps of the trellis diagram for the above constraint length 4 encoder.

2.5 First Lab: Design a Convolution Encoder in Verilog

The constraint-length 4 encoder circuit is a finite-state machine. It is also a shift register.
One can code it following the standard finite-state machine model with three-bit states, or as a shift
register.
reg [2:0] State; // (shift-reg length)=(constraint length -1)
always @(posedge clk)
begin
State <=State >> 1; // Right shift 1 position
end
The test bench is not part of the circuit. It supplies the input sig-
nals and may compare output signals with those that are expected. Test Bench module
Test benches are easier to write and use if they are synchronous. This
means they always send out signals slightly after the clock (or at least clk reset x z1 z2
never at the same time as the clock). It also means a writing style with
few #n delay.
ConvEncode module
initial begin
... #11 x=1; Poor
#10 x=0; FIGURE 6
#10 x=1;
#20 x=0;
#10 x=1;
...
2.5.1 A Synchronous Test Bench

In a synchronous test bench, the test signals are timed by @(posedge clk) statements rather
than each having its own timing. There are only two delays here, one to set the clock period, and
the other to delay the input signal x so it’s changes are obviously past the clock edge.
Note there are things not included here, like reset.
module SyncTestBench;
reg [1:8] data; //Fill this with the data stream to be encoded.
// Note the first bit is on the left
reg x, clk;
integer I; // Use integers for counter indexes
initial
begin
I=1;
data=8'b1010_1101; // Underscore has no meaning except
// to visually space the bits.
clk=0;
forever #5 clk=~clk;
end
// send in a new value of x every clock cycle
always @(posedge clk)
begin
if (I==9) $finish; // Stop the simulation when one runs out of data.
// The #1 makes x change 1ns after the clock and never on the clock edge.
// The nonblocking symbol “<=” on I ensures that any other clocked module using
// I will grab the same I as this procedure, that is before I is updated to I+1.
x<= #1 data[I];
I<=I+1;

end
endmodule
For the constraint length 3 system, you must have the test bench automatically compare your
answer with the result you obtained from your student number.
1. Write a finite-state machine encoder for the constraint length 3 system.
always @( state or x)
nxtstate= {state[2:0],x};
···
always @(posedge clk or posedge reset)
···
state <=nxtstate
2. Write a shift-register based encoder for the constraint length 4 system (Sect 2.3 prob 2).
Generating vectors [1101] and [1111].
always @(posedge clk or posedge reset)
···
state <=state<< 1; // Left shift 1 position
state[0] <= x; // Overwrite the zero shifted in on the previous line.
3. Write a synchronous test bench so the other two modules can be simulated.
a) Compare the result for the data 1 1 0 1 0 0 1 0 1 1 0 0 0 0
Ans: Constraint length 3 encoded data-
11,01,01,00,10,11,11,10,00,01,01,11,00,00,00,00.
Ans: Constraint length 4 encoded data-
11,00,10,01,00,01,00,11,10,11,10,10,11,00,00,00,00,
3.0 Convolution Decoder

The next part of the project will be to design a convolution decoder to retrieve the informa-
tion bits from the transmitted bits. It should succeed even in the presence of some errors in the
transmitted bits. The method we will use is called a Viterbi decoder.
3.1 Decoding Using the Trellis Diagram

Consider a decoder that receives the transmitted signal 11 01 01 00 10 11 going from t=0 to
t=6. Assume the trellis was reset to state S0 (00) at the start. One goes through the trellis as before,
only for decoding the numbers are output\input. So the first input, 11, gives a decoded output of 1
and takes the machine to state S1.
At t=1 in state S1, the next input 01 causes a 1 output and a change to state S3.

Trellis Diagram for Decoding (Receiving) FIGURE 7

Received 11 01 01 00 10 11
input
Decoded 1 1 0 1 0 0
output
00/0 00/0 00/0 00/0 00/0 00/0
S0 S0 S0 S0 S0 S0 S0
11/ 11 11 1/1 11/ 11/
1 /1 /1 1 1 1
0
/0
/0
/0
/0
/
S1 S S S
/
S
11
11
11
S
11
11
/1 1
11
/1 1 00 /1 1 /1 1 /1 1 /1 1
00 00 00 00 00
10 10 10 10/ 10 10
/0 /0 /0 0 /0 /0
01
01
01
01
01
S2 S2 S2
01
S2 S2 S2 S2
/1
/1
/1
/1
/1
1
/0 /0 /0 /0 0 /0
01 01 01 01 01/ 01
S3 10/1 S3 10/1 S3 10/1 S3 10/1 S3 10/1 10/1
S3 S3
t=0 t=1 t=2 t=3 t=4 t=5 t=6
3.1.1 The Hamming Distance (Metric) FIGURE 8

Received 11 11
This distance is used to show how far apart two input
binary numbers are. Compare the bits in the same posi-
00 2 S0
tions in the two numbers. The number of positions that S0 S0
11 0
are different is the Hamming distance (h). Hamming distance
Thus 11and 01 are distance 1 apart (h=1),
S1 S1 S1
1001001 and 1001010 are distance 2 apart (h=2).
10
1
Applying the Hamming Distance to Decoding
01
S2 S2 S2
1
Suppose the first four received bits have an error
so instead of 11 01, one receives 11 11. On the trellis
in Fig. 8, there are two choices leaving state S0, one for S3 S3 S3
input 11 0 and the other for input 00 2 . The number in t=0 t=1 t=2
the box is the Hamming distance between the received Instead of input/output (i.e. 11/1)
Þ
input and the bits for the transition. It is clear one we now show input Hamming distance
should make the transition from S0 S1.
The next input has an error. Note there are no 11 or 00 paths leaving state S1. Both possible
paths, 10 or 01, are at Hamming distance 1. At this time either transition looks equally likely, but
wait!

Trellis Diagram FIGURE 9

error
Received 11 11 01 00 10 11
input
00 2 S0 00 0 00 1
S0 S0 S0 S0 S0 S0
11 0 11 1 11
0
0
S1 S1 S1 1 S1 0 S1
11
1 S1 S1
11
11
00 00 00
10 10 10 10
1 1 0 1
01
01
S2 S2 S2
01
S2 S2 S2 S2
1
1
0 10 1
Outputs 01
is 0 S3 S3 S3 S3 10 0 S3 10 1 S3
is 1 t=1 t=2 t=3 t=4 t=5 t=6
At t=2, if one starts from S3, then h=0 for the path to state S2. However if one starts from S2
one has h=1 for either the path to S0 or to S1.
Thus at t=1 the proper path was not obvious, at t=2, the choice is clearer. We will chose a path
through the trellis based on the path Hamming distance or path metric, which is the sum of the
Hamming distances as one steps along a path through the trellis.
Figure 10 shows how the Hamming distances sum as one goes down various paths through
the trellis diagram. At t=5, one path has a total distance of 1 from the input data. The others have
FIGURE 10
00 2 S0 1+ 1+ 0 3 1
S0 S0 S0 S0 S0
0 3 1
+1
1
S1 S1 S1 0 1
1 S1
Hamming distance
0 + S1 4 1 S1
1+ 1+ 4
1 1+ 1
Lowest path
1+ 0
S2 S2 S2 S 1 S2 S2
+0 2
1
1 1+
1
+1
S3 S3 S3 S3 S3 3 0 S3
t=1 t=2 t=3 t=4 t=5
The sum of the Hamming distances is shown like 1 + 1 + 0 until this gets too messy.
Then the sum, and the distance for the current step, are shown in a hexagon and a box like 2 1
a distance of 3 or 4. Thus the most likely path is S0 S1 S3 Þ Þ ÞS ÞS ÞS with a path distance

2 1 2
of 1, and the corresponding output data is 11010 (Recall trellis edges represent a receiver
output of 0, and edges represent an output of 1).
3.1.2 Metrics
A metric is a measure of something. The more general name for what we called the Ham-
ming distance is branch metric, and for the path Hamming distance is path metric. One does not
have to use the the Hamming distance as a measure. In decoders where the input is an analog sig-
nal, the distance between the actual and expected voltage may be measured, and the sum of the
squares of the errors might be used for the branch metric.

References:
Bernard Sklar, Digital Communications Fundamentals and Applications.
B.P.Lathe, Modern Digital and Analog Communication Systems, Holt, Rinehart & Winston, 1989
Prof. Ian Marsland, http://www.sce.carleton.ca/courses/94554/
Click on "Convolutional Codes." You will need Ghostview (gv) to read the Postscript file.
4.0 The Viterbi Decoder

The decoding example shown above has to follow four paths through the trellis, and remem-
ber them for future decisions. For larger decoders, such the cell phone ones with constraint lengths
(shift-register lengths + 1) of 6 and 9, the number of paths can get quite large (32 and 256).
Viterbi developed an algorithm 1n 1967, which allows the many paths to be discarded with-
out tracking them to completion. He noticed that if two paths merge to one state, only the one with
the smaller path Hamming metric, “H,” need be remembered. The other one can never be part of
the most likely path.
This means that with the constraint length 3 (shift-register length 2) system in the pre-
vious examples only have to remember 4 paths. In general a constraint length K system will have
to remember 2K-1 paths. In theory, the path length should go for the length of the message in order
to get the true maximum likelihood path. However it turns out that path lengths of 4 to 5 times the
constraint length can almost always give the best path.
The next few figures show how the decoder picks the best path, even when there are errors.
FIGURE 11
error At t=0
Received 11
input 01→11 Starting at state S0, there are two possible paths.
The boxes 2 show one step Hamming metric
00 2 00 2 4 S The hexagons 4 show “H”, the path
S0 S0 0 Hamming metric
11 0 11
0 At t=1
2
S1 S1 S1 There are 4 paths out to t=2
10 Since the input has an error,
1 no path has a zero Hamming metric “H”.
1
S2 State
01
S2 S2 S2
1
4 path Hamming metric, “H”.

1
2 branch Hamming metric, “h”, for this step

S3 S3 S3 00 Rec’d input to make this transition
t=1 Output a 0 if this transition is used
t=0 t=2 Output a 1 if this transition is used

error
Received 11 01
input 01→11
Going out to t=3
The paths temporarily double to 8
S0 00 2 S0 002 4 S 00 1 5 S0
0 11 There are two paths to each node.
11 0 11 1 5
0 2 One has a larger path distance.
2
1 The larger “H” path can never be
S1 S1 S1 11 1 2 S1
10 the most likely path, hence we will
1 001
1 0 erase it.
01 2 6
01
S2 S2 S2 S2
0
1
1
0 1
2
01
S3 S3 S3 10 2 3 S3
t=0 t=1 t=2 t=3 FIGURE 12
FIGURE 13
Here just the path Hamming distances Here the unneeded paths are eliminated.
are shown so it is easy to see which paths This is all important information up to
should be erased. to t=3.
Rec’d 11 error error
01→11 01 11 01
01 →11
2 4 5 2
S0 S0 S0 S0 S0 S0 S0 S0
0 5 2 0 2
S1 S1 S1 S1 S1 S1 S1 S1
2 2
1 1
6
S2 S2 S2 1
S2 S2 S2 S2 1 S2
1
1
3 2
S3 S3 S3 S3 S3 S3 S3 S3
t=0 t=1 t=2 t=3 t=0 t=1 t=2 t=3
FIGURE 14
error 2nd error
Rec’d 11 01→11 01 00→01 Entering t=4,
input
00 Eight paths are created temporarily
A A A A 11 1 3 S0
0 2 1 Again at each node, only the
2 3 2
lowest path Hamming-
1 distance path can be part of the
S1 S1 S1 S110 2 11 S
2 0 1 2 1 most likely path so four of the eight
1 0 paths will again be eliminated.
01 4
S2 S2 S2 The second error has made
0
1 S2 2 S2
four paths, all with equal chances
1
0
01 of being the most-likely path.
2
2
S3 S3 S3 S3 10 2 4 S3 (four paths with 2 )
But see what happens next.
t=0 t=1 t=2 t=3 t=4

FIGURE 15
error
Rec’d 11 01→11 01 2nd error
input 00→01 10
Entering t=5
S0 S0 S0 S0 S0 00 1 3 S0 Eight new paths are
0 11 1
2 2 2 3 created, keep the four
3
lowest H ones enter-
S1 S1 S1 1 ing each state
2
S1 S1 11 1 3 S1
2 001 Since two paths
1 0
01 0 2 have the same H
S2 S2 S2 1 S2 S2 S2 as they enter S1,
2
1 we have no way
2 2 4 of choosing a
4
2 01 better path.
2
S3 S3 S3 S3 S3 10 0 2 S3
Pick one randomly;
Þ
t=0 t=1 t=2 t=3 t=4 t=5 Here we choose
There are two most likely paths the one S0 S1.
Note path dies out.
(with H=2). Down from four at t=4
error 2nd error Entering t=6

Rec’d 11 01→11 01 00→01 10 11
input Note how the
3 correlation has
S0 S0 S0 S0 S0 S0 100 2 5 S0 continued to act.
0 10
2 2 2 3 2 We are now down
3
to one most likely
S1 S1 S1 S1 S1 0
2 S1 11 2 4 S1 path.
2
1 001
2 0
01 1 4
S2 S2 S2 S2 S S2 S2
2 2
1
1
3
1
1
4
2
01
2
S3 S3 S3 S3 S3 2 S3 10 1 3 S3
t=0 t=1 t=2 t=3 t=4 t=5 t=6 FIGURE 16
error FIGURE 17
Rec’d 11 2nd error
input 01→11 01 00→01 10 11 11
S0 S0 S0 S0 S0 3 S0 S0 100 2 4 S0
0 10
2 2 2 3 3 2 2 3
S1 S1 S1 S1 S1 S1 S1 0
2
11 2 5 S1
2
1 0010
2 01 1 4
S2 S2 S2 S2 S2 S2 S2 S2
1
1 2 3
1 4
1
2 01
2
S3 S3 S3 S3 S3 3 S3 S3 10 1 4 S32
t=0 t=1 t=2 t=3 t=5 t=4
t=6 t=7
Entering t=7: The one best path is getting fairly clear.
The most likely path ends at S1 and a close second at S0. The paths to S2 and S3 have H= 4

Retrieving the Original Message FIGURE 18

error 2nd error
Rec’d 11 01→11 01 00→01 10 11 11 10
input
S0 S0 S0 S0 S0 S0 S0 S0 4 S0
0
4
S1 S1 S1 S1 S1 S1 S1 S1 S1
2
S2 S2 S2 S2 S2 S2 S2 S2 S2
4
Common path
S3 S3 S3 S3 S3 S3 S3 S3 S3
t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7 t=8
Entering t=8: One can be fairly sure the best path at t=8 ends at state S2.
Follow the path back from
S0<-S1<-S3.<-S2<-S1<-S2<-S0<-S1<-S2
Also follow the paths back from S0, S1 and S3.
No matter what state you start in at t=8, all paths come togethe when you get back to t=1.
From the solid (for “0”)and dashed (for 1) lines along the path one can decode
the probable original message as 11010010 (travelling t=0 to t=8).
As illustrated above, the Viterbi decoder can decode a convolution code in the presence of
some errors.
If two branches entering a state have equal “H,” then the code is unable to tell if one path is
more likely than another. Pick one path at random.
4.1 Exercise 2: Add-Compare-Select Design FIGURE 19
The circuit to add H+h, compare H+h on the two paths, H0 0/00 h00
S0 S0 H0
and select the smaller path metric, is called the add-compare- 1/1
11
1
h
select circuit. h1
1
11
H1 S1 S1 H 1
0/
0
Problem Prolog: Using the algorithm 0h 0
1/00/1
A typical step in the trellis decoder is shown. 0h
10
1/
H2 S2 S2 H 2
01
The path Hamming metrics H at each trellis step are H0,

1
H1, H2, and H3. 1h 0
h 01
The Hamming metric for each edge are given subscripts 0/ 0

H3 S3 1/10 h10 S 3 H3
matching the input which makes them 0. Thus the edge
from S0 to S0 , and S2 to S1 both use the symbol h00. If t=n t=n+1
the input is 00, h 00=0, If the input is 10 or 01, h00=1, if
the input is 11, h00=2.
Pseudocode
This is Verilog in which the syntax is not critical. For example begin, end and semicolons
may be omitted if the meaning is clear to the reader. In pseudocode the comments are often
more important than the code.

1. Problem
a) Starting at t=8 with an input of 00, as in Figure
Input = 00 FIGURE 20
20, calculate and fill in the values of hij and
hence the Hkfor t=9. h00=
H0=4 S0 S0 H0=
b) Write pseudocode to calculate the branch Ham- h1
ming metrics for each step. 1=
H1=4 S1 S1 H1=
=
Let the two input bits In[1:0] be y, x.
11
=
h
Let the Hamming metrics associated with the h 00h
10 =
eight trellis edges for this step be h00, h01, etc.
h 01
H2=2 S2 S2 H2=
=
Calculate these metrics using a case statement:
=
case ({y, x}) h 01
2'b00 : begin h00=0; ... h11=2; end H3=4 S3 h10= S3 H3=
2'b01 : ...
t=8 t=9
2. Problem:
a) Starting at t=9, using the Hk from Prob 1, Input =10 FIGURE 21
step a) and input 10, calculate and fill in Fig-
ure 21. h00=
H0= S0 S0 H0=5
b) Use Boolean algebra to calculate them as 2-
bit binary numbers. i.e 2=10, 1=01 and 0=00. H1= S1 S1 H1=
Use reg /*wire*/ [1:0] h00, h01, h10, h11;
Example:
H2= S2 S2 H2=
h00[1] = y&x ; h00[0] = y^x;
H3= h10=
S3 S3 H3=
3. Problem t=9 t=10
a) Start at t=10, with input 11 and use the FIGURE 22
Input =11
H k from Prob. 2, step a). Let the new
H k at t=11 be written with a prime i.e.
H 0' H1' H2' and H3'. Fill in Figure 22 but
H0= S0 S0 H0'= H2 + h11 = 2
put in an expression, as well as a
number, for each Hk' . This has already H1= S1 S1 H1'=
been done for H 0'. h1
Only the better path is written here. 0=
H2= S2 S2 H2'=
b) Write pseudocode to update the Hk in
=
going from step t=n to step t=n+1. h 01
H3= S3 S3 H3'=
Use if statements to calculate the Hk' to
be associated with the four states at t=10 t=11
t=n+1. Use H0nxt instead of H0' since
Verilog cannot handle primes.
if (H 0+h00 < H2+h11) begin H0nxt= H 0+h00; end else ...
c) The flip-flop procedure.
Write a procedure to clock the flip-flops and replace the old Hs with the new ones.
Combinational logic in parts 2 and 3 calculated the D inputs for the flip-flops. For example:-
always @(posedge clk
H0 <= H0nxt ....

Don’t put combinational logic in a flip-flop procedure, and don’t forget a reset.
4. Problem: When is the output correct?
Experience has shown that all backward paths converge into one if one traces them back 4
or 5 times the constraint length. Using the paths in Figure 18, you will find that if one traces
back far enough it does not matter which path one follows.
a) Take a copy of Figure 18. Start at t=8; start at each state in turn and colour backwards until
you reach t=0 or until you hit previous colouring. At what time (t=?) do the paths all con-
verge?
5. Problem: When is the output correct?
Look at Figure 14 in which the ending time is t=4.
Test encoder decoder
a) Using data available at t=4 could you say, with confi- Bench
dence, what the original data bit was between t=0 and
t=1? Why not? original data bit
encoded, 2 bits for 1
b) Take a copy of Figure 17. Start at t=7; start at each decoded output, back to 1 bit.
state in turn and colour backwards until you reach t=0
or until you hit previous colouring. At what time (t=?) do the paths all converge?
c) Follow the trellis backwards, and from the information in the trellis find the most probable
original data. Write out the message in the correct order with the earliest (t=0) bit on the left.
6. Problem: Finding the original data from the state.
a) The states can be placed in two sets, even and odd.
If one is in an even state (S0 or S2), what was the original data in the previous step?
If one is in an even state (S1 or S3), what was the original data in the previous step?
Write pseudocode to send out the proper output bit based on the state during the trace back.
reg [1:0] state
if (state is odd) output= ... // Make this more exact.
7. Problem: Use Figure 23 only.
a) If the decoder was in state S2 at t=3, what was the original data (before encoding) between
t=2 and t=3? (The obvious answer is right.)
If it was in state S0 at t=3, what was the original data between t=2 and t=3?
orig. data FIGURE 23
Trellis Diagram with no signal knowledge superimposed
is 0
is 1 S0 S0 S0 S0 S0 S0 S0
S1 S1 S1 S1 S1 S1 S1
S2 ? S2 S2 S2 S2 S2 S2
S3 S3 S3 S3 S3 S3 S3
t=2 t=3 t=4 t=5 t=6 t=7 t=8

8. Problem: How to backtrack.

Figure 24 is the same as Figure 18 except the numbers are all removed. It still contains
enough information to trace back from t=8.
Retrieving the Original Message FIGURE 24
S0 S0 S0 S0 S0 S0 S0 S0 4 S
0
4
S1 S1 S1 S1 S1 S1 S1 S1 S1
2
S2 S2 S2 S2 S2 S2 S2 S2 S2
4
S3 S3 S3 S3 S3 S3 S3 S3 S3
t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7 t=8
Figure 23 shows a trellis decoder only. It gives no information about the data. Figure 24
shows paths, but when you trace back to the area between t=2 and t=3 you cannot tell from
the figure what the data was. However the those who did questions Prob: 6 or 7 can tell you.
Figure 25 is the same as Figure 23 except some little parallelograms have been drawn asso-
ciated with each state in each time step.
Fill a minimum of information in each parallelogram. This information would allow your
lab partner two to back-trace knowing that H 2 had the minimum path Hamming metric at
t=8. Thus by looking only at Figure 25 and starting at state S2 at time t=8, one should be
able to tell what the original bit was between t=2 and t=3. You may establish some conven-
tions like a 1 in the state S2 box means.... However they must be independent of the data.
a) Using Figure 25, fill in the boxes at t=3 if you have not done so already, so that one can de-
termine the original data bits between t=2 and t=3, and also between t=1 and t=2.
You should hand in the filled in Figure 25 and your list of conventions.
FIGURE 25
Trellis Diagram on which you will superimpose signal knowledge
S0 S0 S0 S0 S0 S0 S0
S1 S1 S1 S1 S1 S1 S1
S2 ? S2 S2 S2 S2 S2 S2
S3 S3 S3 S3 S3 S3 S3
t=2 t=3 t=4 t=5 t=6 t=7 t=8
b) How many bits per step must be stored to allow for backtracking and extraction of the orig-
inal data?

9. Problem: In communications latency is the term for the time difference between the time
the input signal was received and the output signal is sent out. Throughput is the number of
input signals that can pe processed per second. The point of this problem is to show it does
not matter how long it takes to decode the data as long as you can keep up with the input.
c) If a decoder had to wait until all paths converged before it had confidence it could send out a
correct output, what would the latency be in clock cycles?
There are two answers for c):
(i) What latency was needed for the data stream as shown in Figure 18 and 17?
(ii) What was the latency, mentioned earlier in these notes, that experience has shown gives
the most likely bit for almost all cases?
d) If the decoder delays the signal by 12 to 15 clock cycles, latency, would anyone care assum-
ing:
• The signal was a digitized phone conversation?
• The signal was a www page?
• The signal was a digital TV signal?
e) If the decoder could not take in the next input until it had spent 12 or 15 clock cycles
processing the previous data, throughput, would this matter?
5.0 Extracting The Original Data.

Consider Figure 26. The path with the lowest path Hamming metric H , starts at state S2 at
t=8 with H=2. Backing up would take the path to S1 at t=7. The edge is a solid line which seems to
say the original data was 0. Unfortunately we can’t be sure of this. Because of the convolution
code, this path’s H of 2 could increase in the next few cycles and another path might get the lowest
H.
FIGURE 26
input error 2nd error
rec’d 11 01 →11 01 00→ 01 10 11 11 10
S0 S0 S0 S0 S0 3 S0 S0 S0 100 1 4 S0
0 3 2 2 11
2 2 4
3
S1 S1 S1 S1 S1 S1 S1 S1 S1
2 2
1 10
2 4
01 0 2
S2 S2 S2 S2 S2 S2 S2 S2 S2
1
1 2
On best path 3
4
2
S3 S3 S3 S3 S3 2 S3 3 S3 4 S3 S3
t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7 t=8
One may not be sure which data bit is best at t=2. However if one traces the paths forward
from t=2, only two paths survives to t=8, and one has a much smaller H.
However if one goes back to t=2 and travels ahead in time, only paths that start at S3 or S2
make it all the way to t=8. The others die out. Only the path from S3 has H=2 at t=8, thus we are
fairly sure the edge from S3 at t=2 to S2 at t=3 is on the most likely path and the original data be-
tween these two clock edges was 0 (a solid line is 0, a broken line is 1).

This illustrates why we waited six cycles here before sending out the output. At time t=8, we
can be somewhat confident that the “0” data at t=2 is the most likely. In general one would wait
twice that time to be very sure.
5.1 Trace Back In More Detail

Tracing back is a long process if the full trace is done every data cycle. The back trace can
be done only if the clock runs several times faster than the data rate. To trace back 15 cycles to find
each output bit, means that the input data rate must be no more than clock/16. One input cycle, fol-
lowed by fifteen trace-back cycles. It turns out one can increase the data rate up as high as clk/2,
but that will come later.
Figure 27 shows the trellis after decoding a 11,10,00,01,01,00,00 input stream.
Figure 28 shows how storing one bit, which shows whether to take the upper or the lower
path during backtrace, will allow one to reconstruct the trellis.
Figure 29 shows all the surviving paths. If one traces any path back from t=7, one will reach
S1 at t=2. Since all the back traces converge, one has confidence that the value of the data original-
ly generated between t1 and t2, was one (dashed lines represent a one). This example converged
quickly, other examples may take longer.
Also note from Figure 28, or from Problem: 6 or Problem: 7 that odd states have only dashed
lines (ones) entering them, and even states have only solid lines (zeros) entering them. This means
that at the end of the traceback, the data was ”0” if one is in an even state and “1” if one is in an
section 3
odd state.
FIGURE 27
Rec’d 11 10 00 01 01 00 00
input
S0 2 2 S0 1 1 S0 0 1 S 1 2 S0 1 3 S0 0 3 S0
0
0 0 1 2 2 2 1 3 1 3 2 2
0 1 3 2 3 5
S1 0 S1 1 S1 2 S1 1 S1 1 S1 2 S1
2
2
1
2
0 0 1
3
1
3
0 0
1 0 1 2 2 1
1 0 2 2 4 4
S2 S2 S2 S2 S2 S2 S2
1
1 3 2 2 0 3
1 2
1 0 0 1
1
4
0
S3 1 1 S3 0 1 S3 1S3 2 22 S3 S3
1 2 3 S3 4
t=0 t=1 t=3 t=2 t=4 t=5 t=6
Trellis path for a pre-encoding data stream of 1011011... The surviving paths are shown
with heavy lines. The paths that die out have light lines. The results for t=7 were left as an
exercise.

FIGURE 28
Rec’d 11 10 00 01 01 00
input 01
↓ ↑ ↑ ↑ ↑ ↓ ↑
S0 S0 1 S0 1 S0 2 S0 3 S S0 3 S0
0
0 2
0 ↑ 1 ↑ ↓ 2 ↑ 3 ↑ ↓ 3 ↑
S1 S1 S1 S1 S1 S1 S1 S1
0 0
1 ↑ 0 ↑ 2 ↑ 2 ↑ ↓ ↓ 2 ↑
S2 S2 S2 S2 S2 S2 S2 S2
0 3
↑ ↓ ↑ ↑ ↑ ↓ ↑
0
2
1
2
S3 S3 1 3S S3 S3 S3 3
S 3 S 3
t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7
A data bit is stored at each trellis state to show which path to take during a backwards trace.
FIGURE 29
Rec’d 11 10 00 01 01 00
input 01
↓ ↑ ↑ ↑ ↑ ↓ ↑
S0 S0 S0 S0 S0 S0 S0 S0
↑ ↑ ↓ ↑ ↑ ↓ ↑
S1 S1 S1 S1 S1 S1 S1 S1
↑ ↑ ↑ ↑ ↓ ↓ ↑
S2 S2 S2 S2 S2 S2 S2 S2
↑ ↓ ↑ ↑ ↑ ↓ ↑
S3 S3 S3 S3 S3 S3 S3 S3
t=0 t=1 t=2 t=3 t=4 t=5 t=6 t=7
The survivor paths with the excess straw removed.
5.1.1 The Trellis Butterfly. FIGURE 30 Trellis made of butterflys

For all rate 1/2 trellises, one can find a small picture
0 0
which describes the trellis completely. The picture looks
something like a butterfly. 1 1
To make the numbers work, one must call the end of 2 2
the shift register where the data enters, the least significant
bit. Since one is used to having the least significant bit on 3 3
the right, we will flips the shift register of Figure 2 around
J x=
without changing the circuit.
x= 0
1 2J
Butterfly
x=0x=1 2J+1
J[1] J[0] x
J+2k-2
t t+1
Then going from J to 2J represents a left shift, and shifting in an x of 0. Going J to 2J+1 rep-
resents a left shift including shifting x=1 into the flip flops.

FIGURE 31 For a constraint length k=3, 2k-2=2.
↑ ↑ ↓ Figure 31shows how to travel backwards

J=J/2 ↑
J=J/2 through the trellis using the bits stored dur-
J=0 J=0 J=0 ↑ J=0
ing each time step to determine whether to
↑ ↑ ↓ ↓ ↑ take the upward or downward path. Here we
+2
+2
J=1 J=1 J=1 start at state J=0. Knowing that one is in
J/2
J=1
J/2
J=
state J allows the two paths to be calculated
J=
↑ ↓ ↓ ↑ on the fly.
J=2 J=2 J=2 J=2
/ 2+
2 ↓
J
↑ J= ↑ ↓ ↑
J=3 J=3 J=3 J=3
t=4 t=5 t=6 t=7
5.1.2 Timing for the simple decoder

The simplest decoder will write 1 bit, and then back trace 15 bits to be sure it has found the
correct path. Then it will backtrace 1 more bit which it will use as output. One will need control
signals as shown.
Data
Clk 1 5 10 15 20
WriteMem
ReadMem
OutputData
This will be slow because the throughput will be 1/17 of the input symbol rate (a symbol
here is two-bits).

5.2 Summary of the Design

dataIn
Test Bench Top module Encoder
reset Encoder
Signal Source clk
Comparison ExtEncodeOut
serial_in
with original Error generator
signal ExtDecodeIn
(loopback) serial_in_err
Top module serial_in_err
Decoder
Serial-to
dataOut Parallel
convSig[1:0]
Add-Compare-Select Module Survivor Memory Module

input clk, reset, convSig[1:0]; input clk, reset, came_from[3:0];
output came_from[3:0]; //maybe input H0,H1,H2,H3;
output dataOut;
FIGURE 32
Figure 32 shows one way to do the Verilog design. The top modules only collect signals and
pass them on to the lower level modules. When an ASIC is built the arguments for the top module
are the pins of the ASIC. If this circuity was to all be built on one chip, one would put on a top-
top or wrapper module where the shaded box is to define the pins.
The error generator is used for testing. To test the circuit, connect the encoder to the decoder
through the error generator, and see if the dataOut equals the dataIn. Under normal (not test) oper-
ation, the encoder would need an output lead to the outside world and the decoder would need an
input lead from outside.
To keep the data rate high, the complete two-bit symbol comes in serially in one clock cycle.
This is immediately converted to two parallel bits, both lasting one clock cycle.
5.3 Second Lab: Start of Convolution Decoder in Verilog

Now we will consider the overall project. You should be able to design each block in the
block diagram except the Survivor Memory block which will not be done until Exercise/Lab 3. For
the initial circuit, you do not need to do a trace back. Just send out as correct, the data from the
state with the lowest H. Of course this will not have any error correction.
You should consider these concepts:
a) Normally the encoder and decoder are widely separated so they cannot run from a common
clock. The decoder will have a clock recovery module. This is beyond 97.478 and we will
use a common clock for both.
b) Your design will be a rate=1/2, constraint length=3; [111],[101] decoder.
c) To save work you will want to parameterize your design. One can do this automatically in
Verilog for some parameters. For others it too much trouble.
• First, design a distinctive comment style like
/*|Para|* comment on parameters *|Para|*/
which indicate code where there are parameters nearby which will need attention. In your
comments cross-reference all the modules that are effected by the parameters.

• Second, there are two ways to pass parameters in Verilog. Which one works for synthesis?
d) Considerable emphasis will be given to testing. One simple test is a loopback test where the
output is sent back to the input and the two compared. Other thing you can do to your circuit
to aid testing will be considered later.
e) The Error Generator block is necessary if you want to simulate to the error correction prop-
erties. First it will be done as a test bench so it is only useful during simulation. Later you
might consider making it part of the loopback test so it can be used for testing in the field.
What to do for the lab
1. Draw a block diagram somewhat like that of Figure 32. 1However make it bigger and show
the arguments passed to all modules. If a module will be over a page of code, try to divide it.
2. Write Verilog for the Decoder Top Module and Encoder Top Module (already done?)
3. Design a serial to parallel module. The serial_in signal changes at twice the clock rate. Let
the serial_in bits be labelled s, e, r, i, a, l ...
Decide how these bits will come out of a transparent-high latch.See the latch_sig below.
You will probably use a transparent latch and two flip-flops in the module.
clk
serial_in serial_in s e r i a l
or serial_in_err
1D
C1 latch_sig s
1D Transparent- ConvSig[1] s r
C1 high latch ConvSig[2] e i
1D
C1 FIGURE 33
Waveforms for serial to parallel
Write the Verilog code for the serial to parallel module.
How do you write a latch in Verilog? Hints, see Figure 37.
4. Write the Add-Compare-Select module. (See Exercise 2.)
5. Modify the Test Bench to handle the decoder // send in a new x every clock cycle
and the encoder. This should include a loop- always @(posedge clk)
back test which compares the dataIn (x on the begin
right) with dataOut. if (I==9) $finish;
You will have a delay (latency) between dataIn x<= #1 data[I];
and dataOut. At the start dataOut will be the bit if(I>latncy) y<=#1 data[I-latncy];
corresponding to the present lowest H (path I<=I+1;
Hamming metric) so the latency will be only end
that of the serial-to-parallel converter. Later assign err=(y!=dataOut) FIGURE 34
you will want to add a latency of 4 to 5 times
the constraint length.
1. We like originality in block diagrams as long as you can give a reason for changes.

6. Add to Add_Compare_Select mod- FIGURE 35

input 11
ule to generate a four-bit signal
called came_from. These signals in- 3 S 00 2 5 S0 came_from[0]=down
0
dicate whether the trellis lines lead- 11 0 2
ing back from the next states to the 3
present-time-step states, came from a 3 S1 0 S1 came_from[1]=up
11 0 2 5
higher state or a lower state. 0 10 0
01 3
Thus it would show whether the 2 S2 2 S2 came_from[2]=up
5
next state S0 came from the present 3
S0 (up) or S2 (down). Figure 35 1
01
shows these bits and their meaning. 2 S3 10 1 3 S3 came_from[3]=down
These four came_from signals will t=now t= (now + 1)
be sent to the survivor-path memory-
module which will be written later.
7. Consider the Error Generator. For the moment you FIGURE 36
may treat it like a test bench so you can use nonsyn- reg [4:0] randy;
thesizable constructs like $random. This gives a new always @(clk) begin
random integer every time it is called. Try to make it //Run at twice clock rate
so there is an error every 16 time steps (every 32 //randy gets the 5 lsb of $random
serial_in bits) on average. A random 5-bit number randy <= $random;
will be 01110 (or any single value) one time out of 32 if (randy = = 4’b01110)
on average. serial_in_err<= ~serial_in;
8. Write a pseudorandom generator as was used in 97.350 to replace $random in question 7.

There is a lot about pseudorandom generators in the notes. If you use such a generator, you
must make its period much longer than 32 or your errors will be periodic. Random errors
that come every 31 bits are not a good test. There was a question about this circuit on the
Winter 2001 final which is available on the web.
9. Take any module except the test bench and run it through the Synopsys Design Analyser.
Try to get a copy of the synthesized circuit for your report.
5.4 Some Hints

FIGURE 37. How to code a latch (Back)
Latches have something happen on both edges. Do not use posedge clock.
Transparent, latches must follow the data. You need more than @(clock).
Put the reset for the latch in the procedure for the latch, not with some other latch or flip-flop.
Unfortunately Synopsys will not synthesize a proper reset on a latch without inserting a
//Synopsys directive comment in the latch code.
//Synopsys async_set_reset “rst”
always @( clk or rst or x)
...
Latches cannot be put in the same procedure as flip-flops.
Do not put logic in the same procedure as the latch.

DC Convolution Codes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DC Convolution Codes

Uploaded by

Copyright:

Available Formats

A property of MVG_OMALLOOR

Introduction to Binary Convolutional Codes [1]

PDF processed with CutePDF evaluation edition www.CutePDF.com

Binary Convolutional Codes

Graduate Institute of Communication Engineering, National Taipei University

Encoders for the Convolutional Codes

Graduate Institute of Communication Engineering, National Taipei University

6. The definition of constraint length of a convolutional code is

Graduate Institute of Communication Engineering, National Taipei University

Encoder for the Binary (2, 1, 2) Convolutional Code

Graduate Institute of Communication Engineering, National Taipei University

Encoder for the Binary (3, 2, 2) Convolutional Code

u = (11 01) v = (110 010 000 001)

Graduate Institute of Communication Engineering, National Taipei University

Impose Response and Convolution

1. The encoders of convolutional codes can be represented by linear

Graduate Institute of Communication Engineering, National Taipei University

Graduate Institute of Communication Engineering, National Taipei University

Impulse Response for the Binary (2, 1, 2) Convolutional

g1 = (1, 1, 1, 0, . . .) = (1, 1, 1), g 2 = (1, 0, 1, 0, . . .) = (1, 0, 1)

Graduate Institute of Communication Engineering, National Taipei University

Impulse Response for the (3, 2, 2) Convolutional Code

u = (11 01) v = (110 010 000 001)

(1) (2) (1) (2) (1)

Graduate Institute of Communication Engineering, National Taipei University

Generator Matrix in the Time Domain

u = (u1,0 , u2,0 , . . . , uk,0 , u1,1 , u2,1 , . . . , uk,1 ,

Graduate Institute of Communication Engineering, National Taipei University

and the output sequences as

v = (v1,0 , v2,0 , . . . , vn,0 , v1,1 , v2,1 , . . . , vn,1 ,

4. v is called a codeword or code sequence.

where G is the generator matrix of the code.

Graduate Institute of Communication Engineering, National Taipei University

6. The generator matrix is

with the k × n submatrices

Graduate Institute of Communication Engineering, National Taipei University

response of the ith input with respect to jth output:

Graduate Institute of Communication Engineering, National Taipei University

Generator Matrix of the Binary (2, 1, 2) Convolutional

Graduate Institute of Communication Engineering, National Taipei University

Graduate Institute of Communication Engineering, National Taipei University

Generator Matrix of the Binary (3, 2, 2) Convolutional

u = (11 01) v = (110 010 000 001)

(1) (2) (1) (2)

Graduate Institute of Communication Engineering, National Taipei University

Graduate Institute of Communication Engineering, National Taipei University

Generator Matrix in the Z Domain

2. The convolutional relation of the Z transform

Graduate Institute of Communication Engineering, National Taipei University

4. We can write the above equations into a matrix multiplication:

V (D) = U (D) · G(D),

U (D) = (U1 (D), U2 (D), . . . , Uk (D))

Graduate Institute of Communication Engineering, National Taipei University

Generator Matrix of the Binary (2, 1, 2) Convolutional

Graduate Institute of Communication Engineering, National Taipei University

g1 = (1, 1, 1, 0, . . .) = (1, 1, 1),

Graduate Institute of Communication Engineering, National Taipei University

Generator Matrix of the Binary (3, 2, 2) Convolutional

u = (11 01) v = (110 010 000 001)

Graduate Institute of Communication Engineering, National Taipei University

(1) (2) (1)

Graduate Institute of Communication Engineering, National Taipei University

Graduate Institute of Communication Engineering, National Taipei University

Graduate Institute of Communication Engineering, National Taipei University