You are on page 1of 39

Topic 2

n-Sequences, Independent
Events and
Error-Correcting Codes

Digital communication is fundamental to many things that we take for granted:


CDs, mobile phones and communicating with the Martian Rovers, for instance.
Error-correcting codes are used to enable errors from fingerprints on CDs, for
example, to be corrected unobtrusively. We begin the introduction to error-
correcting codes in this topic.

2.1 n-Sequences
A sequence is an ordered set and we will use n-sequence to refer to a sequence
with n elements in it. Formally we write an n-sequence as (x1 , x2 , x3 , . . . , xn ) but
often we will omit the parentheses and the commas and just write x1 x2 x3 . . . xn .
EXAMPLE 2.1.1.
The sequence (1,4,5,3,1) has n = 5 entries; the first entry is 1, the second entry
is 4, the third entry is 4, the fourth entry is 3 and the fifth entry is 1.
In the previous example notice that repeated entries in a sequence are
recorded, rather than being omitted as they are in a set. Thus the position
of an element in a sequence is very important and the sequences (1,4,5,3,1) and
(1,5,4,3,1) are different.
We will be using sequences to represent the messages that we want to send,
perhaps from a CD to the speakers, perhaps from Mars to the receivers on
Earth. We will have an alphabet of two symbols, 0 and 1, which we will think
of as the elements of Z2 . We will use these symbols to write messages, just as
we use the 26 letters of the alphabet to write messages in English.
EXAMPLE 2.1.2.
Suppose that we can have messages of length 3. Then there are 8 messages
(sequences of length 3 or 3-sequences) that are possible over Z2 . These are
(remember that we do not write the parentheses or the commas):
000, 001, 010, 011, 100, 101, 110, 111.

15
2.1.1 Exercises
1. Give the four messages possible with binary 2-sequences and the 16 mes-
sages possible with binary 4-sequences.
2. Suppose that we use the elements of Z3 to be the alphabet. Give all
possible messages of lengths 2 and 3.

2.2 Independent Events and Repetition Codes


Suppose that the 8 messages given in Example 2.1.2 represent 8 colours and
that we send one of the 8 messages. We see that if even one digit is changed
then a different colour is received from the one that was sent. And perhaps even
more importantly, we have no way of knowing that the message that was sent
was not the message that was received. This is because we have no redundancy
in the message.
Suppose that we send one message. Suppose that each digit is correctly
received with probability p. Suppose that errors in transmission occur indepen-
dently from digit to digit. Then the probability that the message is correctly
received is p3 . So if p = 0.9 then the probability that the message is correctly
received is 0.93 = 0.729; if p = 0.99 then the probability that the message is
correctly received is 0.993 = 0.970299.
Suppose that we want to transmit an image that has 20 × 20 pixels in it.
Each pixel represents one message with 3 digits. Thus the total message length
is 20 × 20 × 3 = 1200 and the probability that the whole message is correctly
received is p1200 . For p = 0.99 this is 0.000005784; in other words it is very
unlikely to happen. (Most pictures are more likely to be at least 200 × 400
pixels and have 212 = 4096 colours (the Voyager probe, for instance) so the
problem is actually much worse than we have indicated here.)
In English it is rarely true that omitting one letter, or even several, makes
a sentence unintelligible. For example, “Hw r y tdy?” is rapidly interpreted as
“How are you today?”. Thus we can see that there is redundancy in English and
this makes it possible for us to correctly guess at missing parts of the message.
An easy way to introduce redundancy is to send the message several times
and then decode by noting which symbol appears most often in each place. This
is called majority decoding.
EXAMPLE 2.2.1.
Suppose that we transmit each symbol three times and use majority decoding.
If the symbol we want to transmit is “0” then we actually send “000”. If we
receive 000 or 001 or 010 or 100 then we correctly decode to “0”. The probability
that we receive 0 is p3 + p2 q + pqp + qp2 = p3 + 3p2 q (where q = 1 − p). If
p = 0.9 this is 0.972 and if p = 0.99 this is 0.999702.
Returning to the pixels with colours represented by messages of length 3, we
see that if we send the message three times then the probability that any digit of
each pixel is correctly received is 0.972 and the probability that the pixel itself
is correctly received is 0.9723 = 0.91833, up from 0.729 without retransmission.

c
°Debbie Street, 2011 16
If the probability that each digit is correctly received is p = 0.99 then by
sending each digit three times we increase the probability that each digit is
correctly received to 0.999702. Thus the probability that each pixel is cor-
rectly received is 0.9997023 = 0.999106, up from 0.970299. Thinking about
the whole picture of 400 pixels, the probability that this is correctly received
is 0.999106400 = 0.699241 up from 0.000005784. So redundancy has greatly
improved our chances of correctly receiving the picture.
A code in which each message is transmitted more than once is called a repe-
tition code. We use codeword for the string of digits that is actually transmitted.
We will use m to denote the length of the message and n to denote the length
of the codeword. We will speak of an (n, m) code.

2.2.1 Exercises
1. Consider the 4 messages possible using binary 2-sequences. Suppose that
we construct codewords of length 6 by repeating the 2-sequences three
times. Let p be the probability that any digit is correctly received.
(a) Give the 4 possible codewords.
(b) Give all the received strings of length 6 that will be decoded to the
message 00.
(c) What is the probability that the message 00 will be correctly re-
ceived?
(d) Is this probability the same for each of the other 3 messages?

2. Again consider the 4 messages possible using binary 2-sequences. Sup-


pose that instead of repeating the message we instead adjoin one further
symbol, c3 = a1 + a2 mod 2. Thus we have created a (3,2) code.
(a) Give the 4 possible codewords.
P
(b) Check that in all the codewords i ci
= 0 mod 2.
P
(c) When we receive the string c1 c2 c3 we calculate i ci mod 2. If the
result is 1 then we know an error has occurred. Why?
(d) Consider the message 00. Give the received strings in which 0 errors
have been made, 1 error has been made, 2 errors have been made, 3
errors have been made.
(e) Which of these errors will we be able to detect?
(f) For those errors that can not be detected, to what message(s) will
they be decoded?
(g) What is the probability of an undetected error being made in trans-
mission?
(h) How does the probability of undetected errors compare with that of
the (6,2) code discussed in Question 1?
3. Consider the 8 messages possible using binary 3-sequences. Suppose that
we construct codewords of length 6 by adjoining 3 parity check digits.
So the message (m1 , m2 , m3 ) becomes the codeword (m1 , m2 , m3 , m1 +
m2 , m1 + m3 , m2 + m3 ), where addition is carried out mod 2.

c
°Debbie Street, 2011 17
(a) Write down the 8 codewords corresponding to the 8 messages.
(b) For each codeword, write down all the 6-sequences which differ by
one digit from that codeword. How many distinct 6-sequences have
you listed?
(c) How many binary 6-sequences are there altogether? List the binary
6-sequences which are not a codeword and which do not differ from
a codeword in just one digit. For each of these 6-sequences and for
each codeword, list the number of digits that are different.
(d) Hence show that if the probability that each digit is correctly received
is 0.9, then the probability that the message is correctly received is
0.8857.

2.3 Hamming Codes


In this section we will look at one member of the family of Hamming codes.
Hamming codes are important because the codes can be encoded and decoded
very quickly which means that conversations on mobile phones, for example, are
not subject to long delay times.
We suppose that the message consists of 4 binary digits. We assume that
each message gets transformed to a codeword of 7 binary digits. We do this
encoding by post-multiplying by a generator matrix G. G is chosen so that a
single digit change, in any position of the codeword, can be corrected.
EXAMPLE 2.3.1.
Let  
1 0 0 0 1 1 0
 0 1 0 0 1 0 1 
G=
 0
.
0 1 0 0 1 1 
0 0 0 1 1 1 1
Then the message m becomes the codeword c = mG (mod 2). In particular the
message 1011 is transmitted as the codeword 1011G = 1011010.
Note that the first 4 columns of G form the identity matrix of order 4, I4 , so
the first 4 entries in c are just the elements of m. If we think of G as G = [I4 P ]
then c = [m mP ].
EXAMPLE 2.3.2.
Let  
1 1 0 1 1 0 0
H= 1 0 1 1 0 1 0 .
0 1 1 1 0 0 1
We can see that H = [P 0 I3 ]. Consider what we· get¸ when we post-multiply
P
the codeword c by H 0 . We have cH 0 = [m mP ] = mP + mP = 0 mod
I3
2.
This is why decoding is so fast in the Hamming code - we just need to
multiply the codeword by the matrix H 0 . If the answer is 0 mod 2 then the
message is the first 4 entries in c. The matrix H is called the parity check
matrix.

c
°Debbie Street, 2011 18
What happens if a single error is made in transmission? Suppose that we
receive c + ei , where ei has a “1” in position i and is 0 in all other positions.
Then (c + ei )H 0 = cH 0 + ei H 0 = ei H 0 (mod 2) and this is just row i of H 0 . So
if there is only one error we will know which position it is in since the 7 rows of
H 0 are all distinct.
EXAMPLE 2.3.3.
Suppose that we receive v = 1100001. Then vH 0 = 010 which is the 6th row
of H 0 and so the error was in the 6th digit and received codeword should have
been 1100011, corresponding to the message 1100.

2.3.1 Exercises
1. Suppose that you received 0100111 and 0101010, both sent using the [7,4]
Hamming code. What were the corresponding messages?
2. Let  
1 0 0 1 1 0
G= 0 1 0 0 1 1 .
0 0 1 1 1 1
Use G as the generating matrix for a (6,3) code.
(a) Give the codewords for each of the 8 messages.
(b) Find the corresponding parity check matrix.
(c) Show that this code can correct all errors which change only one
digit.
(d) Show that it can detect all errors which change 2 digits.
(e) Show that it can not detect errors that are of the form that are the
difference between two codewords; that is of the form ci − cj .
3. Find the generator matrix and the parity check matrix for the (6,2) triple
repetition binary code.
4. Find the generator matrix for the code of Question 3 of Exercises 2.2.1.

2.4 Hadamard Codes


A Hadamard matrix, Hh , is a (-1,1) matrix of order h such that Hh Hh0 = hIh .
EXAMPLE 2.4.1.
   
· ¸ 1 1 1 1 1 1 1 1
1 1  1 1 −1 −1   −1 −1 1 1 
H2 = ; H4 = 
 1
 ; H4 =  .
1 −1 −1 1 −1   −1 1 −1 1 
1 −1 −1 1 1 −1 −1 1
Hadamard matrices can only exist for order 2 and for orders that are a
multiple of 4. Whether or not Hadamard matrices exist for all orders that are
a multiple of 4 is not known.
One easy way to generate a Hadamard matrix is to take the Kronecker
product of two Hadamard matrices. Recall that if we have two matrices A =

c
°Debbie Street, 2011 19
(aij ) and B = (bij ) then the Kronecker product of A and B, written A ⊗ B, has
entries aij B.
EXAMPLE 2.4.2.
 
1 1 1 1
 1 −1 1 −1 
H2 ⊗ H2 = 
 1
.
1 −1 −1 
1 −1 −1 1

A Hadamard code uses the rows of Hh and −Hh as the codewords (and so
uses the alphabet −1 and 1).
EXAMPLE 2.4.3.
Let h = 4. Then, if we use the first H4 given in Example 2.4.1, the 8 codewords
in the Hadamard code are

1 1 1 1 -1 -1 -1 -1
1 1 -1 -1 -1 -1 1 1
1 -1 1 -1 -1 1 -1 1
1 -1 -1 1 -1 1 1 -1
Let v = 1111. Then vH40 = (4, 0, 0, 0). Now consider v = 111 − 1. Then
vH40 = (2, 2, 2, 2).
It is easy to check that all codewords in the Hadamard code have entries in
vHh0 equal to h, −h or 0. But more can be said; see the exercises.
The Mariner spacecraft used the Hadamard code derived from H32 . The
colours were represented by 6 bits which then became codewords of length 32.
Thus up to 7 errors could be corrected and up to 15 errors could be detected.
(Why?)

2.4.1 Exercises
1. Consider the Hadamard code in Example 2.4.3.
(a) Evaluate vH40 for all 16 possible v.
(b) List the v that are in the code. What are the corresponding values
of vH40 ?
(c) List the v that differ in one position from a codeword. What are the
corresponding values of vH40 ?
(d) List the v that differ in two positions from a codeword. What are
the corresponding values of vH40 ?
(e) List the v that differ in 3 positions from a codeword. What are the
corresponding values of vH40 ?
(f) Comment.
2. Construct a Hadamard code of length 8. Now repeat the previous exercise.
(Don’t forget that Mathematica can be used to do this sort of thing easily.)

c
°Debbie Street, 2011 20
2.5 References and Comments
Trappe and Washington (2006) provide a brief introduction to error-correcting
codes. A longer and very readable account may be found in Hankerson, Hoffman,
Leonard, Linder, Phelps, Rodger, and Wall (2000).

c
°Debbie Street, 2011 21
Topic 3

Introduction to Linear
Codes

We saw in the previous topic that Hamming codes are fast to decode since matrix
multiplication was all that was required, as opposed to searching through a list
of possible received strings to obtain the closest codeword and corresponding
message. Hamming codes are an example of linear codes and we will discuss
linear codes further here. We will also introduce the idea of cyclic linear codes.
In the following topic we will extend these ideas further to develop the codes
that are used to record music on CDs.

3.1 Linear Codes


A code C is said to be linear if the sum of any two codewords is also a codeword.
Formally, if v, w ∈ C then v + w ∈ C. Since we are considering codewords to be
binary n-sequences, we will calculate the sum of two codewords component-wise
mod 2.

EXAMPLE 3.1.1.
Let C = {0000, 0011, 1100, 1111}. C is a linear code since 0011+1100=1111,
1100+1111=0011, and 0011+1111=1100.
EXAMPLE 3.1.2.
Let C = {0000, 0011, 1100, 0101, 1111}. C is not a linear code since 0011+0101=0110
and this is not in C.
Thus a linear code is said to be closed under addition since the sum of any
two codewords of a linear code is also a codeword. The set of all 2n binary
n-sequences, which we will represent by Z2n , is closed under addition, of course.
Any binary n-sequence can be written as the sum of the n binary n-sequences
ei , where ei has a 0 in all positions except position i where it has a 1. Since we
can not express any of the ei as a linear combination of the other ej we say that
the ei are linearly independent. Since every n-sequence in Z2n can be written as
a linear combination of the ei and the ei are linearly independent we say that
e1 , e2 , . . . , en form a basis for Z2n .

22
EXAMPLE 3.1.3.
Let n = 3. Then Z23 = {000, 001, 010, 011, 100, 101, 110, 111} and e1 = 100, e2 =
010 and e3 = 001. Then xyz ∈ Z23 can be written as xyz = xe1 + ye2 + ze3 .
For a linear code C, we have that C ⊂ Z2n and that C is closed under
addition. That means that we can find a basis for C. We can then write the
basis vectors as the rows of a matrix, which we will call G, and all the codewords
can be written as linear combinations of these rows. So G is called the generator
matrix of the linear code C. If we change the -1s to 0s then the Hamming code
of the previous topic is one example of a linear code.
Recall that we are interested in codes because we want to be able to send
messages that are likely to be received correctly. If two codewords are “too
close together”, whatever that means exactly, then it is likely that we will have
trouble telling which one was actually received.
We will define the Hamming distance to be the number of places where the
codewords u and v differ, written d(u, v).
EXAMPLE 3.1.4.
Let u = 10101010 and let v = 00001111. Then d(u, v) = 4.
Clearly d(u, v) is the minimum number of errors that are needed to change
u to v. The Hamming distance, d, is an example of a metric. This means that
it satisfies the following three statements.
1. d(u, v) ≥ 0.
2. d(u, v) = d(v, u).
3. d(u, v) ≤ d(u, w) + d(w, v).
The third statement is perhaps best thought of by saying that if u and v
differ in some place then either u and w differ there or w and v differ there, or
both u and w and w and v differ there.
We define the minimum distance of a code C by

d(C) = min{d(u, v)|u, v ∈ C, u 6= v}.

THEOREM 3.1.1.

1. A code can detect up to s errors if d(C) ≥ s + 1.


2. A code C can correct up to t errors if d(C) ≥ 2t + 1.
Proof. 1. Suppose d(C) ≥ s + 1. Suppose that c is sent and that s or fewer
errors are made. Then the received n-sequence is not a codeword and so
the fact that there are errors is detected.
2. Suppose that d(C) ≥ 2t + 1. Suppose that c is sent and that r is received.
If d(c, r) ≤ t then c is the codeword closest to r. This follows since

2t + 1 ≤ d(C) ≤ d(c, u) ≤ d(c, r) + d(r, u) ≤ t + d(r, u)

and so d(r, u) ≥ t + 1 for any other u ∈ C. Thus r correctly decodes to c.

c
°Debbie Street, 2011 23
We will now talk about an (n, m, d) code, where n is the length of the
codewords, m is the length of the message and d is the minimum distance of
the code.
We define the Hamming weight of u to be the number of non-zero entries in
u; that is, d(u, 0) = wt(u).
LEMMA 3.1.1.
Let C be a linear code. Then d(C) equals the minimum Hamming weight of any
of the non-zero codewords.
Proof. Since wt(u) = d(u, 0) we have that wt(u) ≥ d(C). We also know that
d(u, v) = wt(u − v). So choose u, v ∈ C such that d(u, v) = d(C). Then
d(u, v) = wt(u − v) = d(C) and the result follows.
We will assume that the first m binary digits are the information digits and
that the remaining n − m binary digits are the check symbols. Then we can
write G in the standard form with G = [Im P ]. As in the case of the Hamming
code, we get a parity check matrix, H, by calculating H = [P 0 In−m ]. We can
detect errors by calculating vH 0 for the received n-sequence v. If vH 0 = 0 then
it is unlikely that an error has been made.
EXAMPLE 3.1.5.
Consider the linear code in Example 3.1.1. The code is C = {0000, 0011, 1100, 1111}.
We see that 0011 and 1100 form a basis for this code. So we have
· ¸
0 0 1 1
G=
1 1 0 0
which is not in standard form. To get G into that form we need to interchange
some of the columns in G. If we interchange columns 1 and 3 we get
· ¸
1 0 0 1
Gs = .
0 1 1 0
The weight of the non-zero codewords are 2, 2 and 4 and d(C) = 2.
EXAMPLE 3.1.6.
Consider the generator matrix
 
1 0 1 0 1 1
G= 0 1 1 1 1 0 .
0 0 0 1 1 1
Clearly G is not in standard form. Since the first two columns are in the required
form, we start by swapping columns to move a 1 into position (3,1). We swap
columns 3 and 4 in G to get
 
1 0 0 1 1 1
G1 =  0 1 1 1 1 0  .
0 0 1 0 1 1
The first two entries in the third row are 0 but the third entry in row 2 is a 1
and not a 0. We can add rows 2 and 3 to get the required generator matrix.
 
1 0 0 1 1 1
Gs =  0 1 0 1 0 1 .
0 0 1 0 1 1

c
°Debbie Street, 2011 24
Suppose that we had a message m. Initially we would have sent

mG = m1 (101011) + m2 (011110) + m3 (000111)


= (m1 , m2 , m1 + m2 , m2 + m3 , m1 + m2 + m3 , m1 + m3 )

and using Gs we would send

mGs = (m1 , m2 , m3 , m1 + m2 , m1 + m3 , m1 + m2 + m3 ).

We can recover the message from either of these codewords of course.


We can use the parity check matrix to determine the distance of the code C
with generator matrix G, as the next result shows.
THEOREM 3.1.2.
Let H be the parity check matrix for the code C. Then d(C) = d if and only
if any set of d − 1 rows of H 0 is linearly independent and at least one set of d
rows of H 0 is linearly dependent.
Proof. We know that cH 0 = 0 for all the codewords. Thus there is a linear
combination of wt(c) rows of H 0 which is 0. So if d(C) = d then there is at least
one codeword of weight d. If there is a set of d − 1 rows of H 0 that are linearly
dependent then d(C) 6= d.

3.1.1 Exercises
1. Find the generator matrix and parity check matrix of the (5,3) binary
code whose messages are the eight binary triples with encoding

xyz 7→ x y x + y z y + z.

Now find a mapping so that G is in standard form.


2. Consider the generator matrix
 
1 1 0 1 0 0 1
 0 0 1 0 1 1 1 
G=  0
.
1 0 1 0 1 0 
1 1 1 1 1 1 1

(a) Find a generator matrix in standard form.


(b) What is the corresponding parity check matrix?
(c) Encode the messages 1100, 1010 and 0110.
3. Give the parity check matrix, Hs , corresponding to Gs in Example 3.1.6.
(a) What is the distance of this code?
(b) Find sets of columns of Hs that sum to 0.
(c) Show that the minimum weight of a non-zero codeword equals the
minimum number of columns of Hs which sum to 0.

c
°Debbie Street, 2011 25
4. Consider the matrix  
0 1 1 0
 1 0 1 1 
S=
 1
.
1 0 1 
1 1 1 0
Show that S can not be the generator matrix of a binary linear code.
5. Show that in a binary linear code either all of the codewords have even
weight, or half of the codewords have even weight and half have odd weight.
6. Let S = {100, 101, 111}. Give the linear code that arises from using the
elements of S as the rows of a generator matrix. (Another way of saying
this is to say that we are using the elements of S as a basis for the linear
code.)
7. Let S = {1010, 0101, 1111}. Give the linear code that arises from using
the elements of S as the rows of a generator matrix.
8. let S = {011, 101, 110, 111}. Show that the elements in S are not linearly
independent. Give a subset of S which is linearly independent and give
the corresponding linear code. Can you give a second set of linearly inde-
pendent basis vectors for this code?

3.2 Cosets
Suppose that C is an (n, m, d) linear code and let u ∈ Z2n . Then the coset of C
generated by u is given by

C + u = {v + u|v ∈ C}.

Note that the coset of C generated by any u ∈ C is C.


EXAMPLE 3.2.1.
Let C = {000, 111}. Then the three other cosets are C + 001 = C + 110 =
{001, 110}, C +010 = C +101 = {010, 101} and C +100 = C +011 = {100, 011}.

This example illustrates a general result: If w ∈ C + u then C + w = C + u.


The next result summarises a number of useful facts about cosets, most of
which are immediately apparent and so the technical proofs are left as exercises.
THEOREM 3.2.1.
Let C be a linear code of length n. Let u, v ∈ C.
1. If u ∈ C + v, then C + u = C + v.
2. u ∈ C + u.
3. If u + v ∈ C then C + u = C + v.
4. If u + v ∈
/ C then C + u 6= C + v.
5. Every element of Z2n is in exactly one coset. So any two cosets are either
equal or disjoint.

c
°Debbie Street, 2011 26
6. All cosets have the same number of elements as C.
7. If C has dimension k then all cosets have 2k elements in them and there
are 2n−k distinct cosets.
8. C is a coset.
EXAMPLE 3.2.2.
Consider the code C generated by the matrix
· ¸
1 0 1 0 1
G= .
0 1 0 1 1
Then there are four codewords in C, 00000, 10101, 01011 and 11110, arising
from the messages 00, 10, 01 and 11 in order. We see that the minimum weight
of a non-zero codeword is 3 so d(C) = 3. There are 8 cosets corresponding to
C; these are given as the columns in Table 3.1. Observe that each 5-sequence
of weight 1 appears in a unique coset and that there are sequences of weight 2
that appear in the same coset (for instance, 11000 and 00110).

Table 3.1: Eight Cosets of a (5,2,3) Linear Code

C
00000 00001 00010 00100 01000 10000 00110 01100
10101 10100 10111 10001 11101 00101 10011 11001
01011 01010 01001 01111 00011 11011 01101 00111
11110 11111 11100 11010 10110 01110 11000 10010

The cosets are very helpful for the purposes of decoding. We know that
cH 0 = 0 if c ∈ C. We also know that all n-sequences, v and w, say, in the same
coset have vH 0 = wH 0 since we know that v + w ∈ C. Thus we know that
the error pattern and the received word are in the same coset of C. Since we
expect as few errors as possible in an error pattern, when we receive v we find
the coset that contains v, find a word of least weight in that coset, u, say, and
assume that v + u was the codeword sent.
We call the word of least weight in a coset the coset leader. In some cases
the coset leaders are unambiguously determined (the first 6 cosets in Table 3.1)
and in some there are two or more words of equal weight. In this case one is
chosen at random to be the coset leader.
We use the coset leaders to construct a fast decoding strategy. For each
coset leader u we calculate the syndrome uH 0 . When we receive v we evaluate
vH 0 and find the matching syndrome. The corresponding coset leader is the
most likely error pattern and we assume that v + u was the codeword sent.
EXAMPLE 3.2.3.
Consider the code given in Example 3.2.2 and the corresponding cosets given in
Table 3.1. The parity check matrix H is given by
 
1 0 1 0 0
H =  0 1 0 1 0 .
1 1 0 0 1

c
°Debbie Street, 2011 27
The coset leaders and the corresponding syndromes are given in Table 3.2.
Suppose that we received the codeword 10011. Then we evaluate
10011.H 0 = 110.
Thus we know that the coset leader for that syndrome is 00110. Hence we
assume that the transmitted codeword is
10011 + 00110 = 10101
with corresponding message 10. If we had chosen as the coset leader 11000 then
we would have assumed that the transmitted codeword was
10011 + 11000 = 01011
with corresponding message 01. Thus the choice of coset leader has a direct
bearing on the message received. Note that this is not the case for the cosets
with coset leaders of weight 1. This reflects the fact the distance of the code is
3 and so all errors of weight 1 can be corrected while errors of weight 2 can only
be detected.

Table 3.2: The Eight Syndromes of a (5,2,3) Linear Code

Coset Leader Syndrome


00000 000
00001 001
00010 010
00100 100
01000 011
10000 101
00110 110
01100 111

In the previous example and discussion we have been making use of the
following result.
THEOREM 3.2.2.
Let C be a linear code of length n with parity check matrix H. Let u and v be
elements of Z2n .
1. uH 0 = 0 if and only if u ∈ C.
2. uH 0 = vH 0 if and only if u and v are in the same coset of C.
3. If the error pattern in a received word is u then uH 0 is the sum of the
rows of H 0 that correspond to the positions of the errors.

3.2.1 Exercises
1. List the cosets of the code C = {0000, 1001, 0110, 1111}. Give the syn-
dromes corresponding to each of the cosets. Are the coset leaders uniquely
determined?

c
°Debbie Street, 2011 28
2. List the cosets of the code C = {00000, 10010, 01101, 11111}. Give the
syndromes corresponding to each of the cosets. Are the coset leaders
uniquely determined?
3. Consider the code with generator matrix
 
1 1 1 0 0 0
G= 0 0 1 1 1 0 .
1 0 0 0 1 1

(a) List the cosets for the code.


(b) Are the coset leaders uniquely determined?
(c) Give the syndromes corresponding to each of the cosets.
(d) To which messages would the received words 100001; 001111; and
011011 be decoded?

3.3 Cyclic Linear Codes


A cyclic linear code is a linear code in which for each c = (c1 , c2 , . . . , cn ) ∈ C each
cyclic permutation of c is also in C. Thus (c2 , . . . , cn , c1 ) ∈ C, (c3 , . . . , cn , c1 , c2 ) ∈
C and so on until (cn , c1 , c2 , . . . , cn−1 ) ∈ C. We will write

π(c1 , c2 , . . . , cn ) = (c2 , . . . , cn , c1 ).

EXAMPLE 3.3.1.
Let C = {000, 011, 101, 110}. Let c = 011. Then π(c) = 101 and π 2 (c) = 110.
Since the sum of any two codewords is a codeword we see that C is a cyclic
linear code.
LEMMA 3.3.1.
π is a linear transformation; that is, π(u + v) = π(u) + π(v).
So for a linear code we only need to check that the cyclic shifts of the basis
vectors are in C to know that C is cyclic.
EXAMPLE 3.3.2.
Find the smallest linear cyclic code of length 6 (n = 6) containing c = 100100.
We start by looking at π(100100) = 010010, π 2 (100100) = 001001 and π 3 (100100) =
100100. We then use c, π(100100) and π 2 (100100) as the basis vectors for the
code. Thus we get the 8 codewords 000000, 100100, 010010, 110110, 001001,
101101, 011011 and 111111.

3.3.1 Exercises
1. Find the smallest linear cyclic code containing 010101; 010010; 0101100.
(Use your common sense here. First work out how many codewords there
are. It may make more sense to give the basis vectors and say how many
words are in the code.)
2. Let C be the linear code generated by S. Show that C is a cyclic linear
code when S = {010, 011, 111} and when S = {1010, 0101, 1100}.

c
°Debbie Street, 2011 29
3.4 References and Comments
Hankerson, Hoffman, Leonard, Lindner, Phelps, Rodger, and Wall (2000) have
a detailed introductory level account of linear codes and their applications.

c
°Debbie Street, 2011 30
Topic 4

Reed-Solomon Codes and


Recording Music on CDs

So far we have only been considering messages that are made up of strings of 0s
and 1s. In this section we want to increase the number of symbols available for
each position of the message but we still want to retain the useful properties of
Z2 . To do this we consider polynomials over Z2 . We use these polynomials
to define larger fields (think of extending the real numbers to the complex
numbers). We use this larger field as a larger alphabet for the Reed-Solomon
family of codes which are used to encode information of CDs, DVDs and in the
new Quick-Response (QR) codes.

4.1 Introduction to Finite Fields of Order 2n


We begin by looking at four symbols. Suppose that we work modulo 4 so that
1+1=2, 1+2=3, 1+3=0, 2+2=0, 2+3=1 and so on. When we think about
multiplication, however, we observe that 2 × 2 = 0 mod 4 even though neither
of the factors was 0. Recall that we say that 2 is a divisor of 0. To look at the
problem another way, think about finding a multiplicative inverse for 2. We see
that 2 × 1 = 2, 2 × 2 = 0 and 2 × 3 = 2. Thus 2 does not have a multiplicative
inverse. We want to get a set of 4 elements in which multiplicative inverses
exist.
We do this by using an appropriate polynomial which we evaluate over the
integers modulo 2, Z2 . To get such a polynomial, consider the quadratics over
Z2 , namely, x2 , x2 + x = x(x + 1), x2 + 1 = (x + 1)2 and x2 + x + 1. The
first three quadratic equations factor over Z2 , and so have solutions over Z2 .
Consider the quadratic x2 + x + 1. We see that 12 + 1 + 1 = 1 and 02 + 0 + 1 = 1
in Z2 , and hence the quadratic polynomial x2 + x + 1 does not factor over Z2 .
It is said to be irreducible over Z2 . We now try to embed Z2 in a larger field
in which x2 + x + 1 will factor. (This idea is familiar from the construction of
the complex numbers; to get the complex numbers we adjoin i, with i2 = −1,
to the reals.)
Suppose we let α be a solution of x2 + x + 1 = 0. So α2 + α + 1 = 0 and hence
α2 = α + 1. Since we are working modulo 2, we have (α + 1)2 + (α + 1) + 1 =
(α2 + 1) + (α + 1) + 1 = α2 + α + 1 = 0. Consequently, α + 1 is the other solution

31
of our equation. Thus we get the addition and multiplication tables shown in
Table 4.1.

+ 0 1 α α+1 × 0 1 α α+1
0 0 1 α α+1 0 0 0 0 0
1 1 0 α+1 α 1 0 1 α α+1
α α α+1 0 1 α 0 α α+1 1
α+1 α+1 α 1 0 α+1 0 α+1 1 α

Table 4.1: The finite field with 4 elements

We are, in fact, taking the ring of polynomials over Z2 and working with them
modulo x2 + x + 1 to give the field of order 4. We will write GF [4] for the field
of order 4 (where GF stands for “Galois field” after the French mathematician
Evariste Galois (1811-1832)). The integers modulo n form a field if and only if
n is prime. In the same way, if we start from Zp for some prime p, and consider
the ring of polynomials Zp [x] over Zp , modulo a polynomial f (x), this forms a
field if and only if f (x) is irreducible over Zp .
We will write the entries in GF [4] as ordered pairs: (11) = α+1 and (10) = 1,
for instance. The only other thing that we need to know about finite fields is
that the multiplicative group of the finite field is cyclic. Thus, if we take all
the elements in the field other than 0, then each element can be expressed as
a power of a primitive element of the field. In the case of GF [4] the primitive
element can be taken as α since α2 = α + 1 and α3 = 1. For GF [5], we can use
2 as the primitive element since 22 = 4, 23 = 3 and 24 = 1, or we can use 3 as
the primitive element (32 = 4, 33 = 2 and 34 = 1) but we can not use 4 (since
42 = 1).
We will only consider extensions of GF [2] = Z2 in these notes.

4.1.1 Exercises
1. Show that x3 + x2 + x + 1 is not irreducible over Z2 .
2. (a) Show that x3 + x2 + 1 is irreducible over Z2 .
(b) Use x3 + x2 + 1 over Z2 to generate GF [8].
3. Find all the irreducible cubics over Z2 . Hence find all the possible repre-
sentations of GF [8].
4. Find an irreducible quartic (polynomial of degree 4) over Z2 . Hence find
a representation of GF [16].

4.2 Reed-Solomon Codes


Reed-Solomon codes are cyclic linear codes that use an alphabet from GF [2k ].
They are used extensively in practice; we will discuss how they are used in the
encoding of audio CDs shortly.
We will let GF [2k ][x] be the set of all polynomials with coefficients from
GF [2k ]. We will define a cyclic linear code by using a polynomial with known
roots to give the entries in the first row of the generating matrix. Properties of

c
°Debbie Street, 2011 32
the code can be deduced from properties of the chosen polynomial. The next
example illustrates this idea.
EXAMPLE 4.2.1.
Consider the field GF [23 ] generated by using the irreducible cubic 1 + x + x3 .
We will use β for the primitive element of the field. Then we have

GF [8] = {0, 1, β, β 2 , β 3 = 1 + β, β 4 = β + β 2 , β 5 = 1 + β + β 2 , β 6 = 1 + β 2 }.

Consider the polynomial g(x) = (β + x)(β 2 + x) = β 3 + β 4 x + x2 . Use this to


define the following generating matrix for a cyclic linear code, C, of length 7.
 3 
β β4 1 0 0 0 0
 0 β3 β4 1 0 0 0 
 

G= 0 0 β 3
β 4
1 0 0  .
 0 0 0 β3 β4 1 0 
0 0 0 0 β3 β4 1

Messages in C are of length 5 and have an alphabet chosen from GF [8]. Thus
there are 85 codewords in C. As usual, the message m corresponds to codeword
mG. For example, m = (1, β, β 5 , 0, 0) corresponds to the codeword 1×row 1 of
G + β× row 2 of G + β 5 × row 3 of G which is (β 3 , 0, β 2 , β 4 , β 5 , 0, 0). (From
the list of the elements of the field given above we know that β 2 + β = β 4 , for
example.)

In the previous example we have constructed an (n = 7, m = 5) cyclic linear


code. We say this code has generating polynomial g(x). Clearly the distance of
the code is at most 3 (since there are three non-zero entries in each row of G).
One way to determine the distance of the code would be to show that there is
no linear combination of the rows of G with fewer than 3 non-zero entries. We
will not do that at the moment.
Can we determine a generating polynomial so that we know the properties
of the cyclic linear code to which it gives rise? We start by considering the form
of the parity check matrix of the previous example.
EXAMPLE 4.2.2.
We know that β and β 2 are the roots of g(x). Since β is a root of g(x) this
means that
 
£ ¤ 1
g(β) = β 3 + β 4 × β + β 2 = 0 = β 3 β 4 1  β  .
β2

Also
xg(x) = β 3 x + β 4 x2 + x3 = 0 + β 3 x + β 4 x2 + x3
and so we have
 
1
£ ¤ β 
βg(β) = 0 = 0 β3 β4 1  
 β2  .
β3

c
°Debbie Street, 2011 33
Continuing in this way we see that
 
1
 β 
 
 β2 
 
G
 β3  = 0.

 β4 
 
 β5 
β6
We can use the root β 2 in the same way and so we find that the matrix
 
1 1
 β β2 
 2 
 β (β 2 )2 
 
H0 =  3 2 3 
 β 4 (β 2 )4 
 β 
 5 (β 2 )5 
 β (β ) 
6
β (β 2 )6
satisfies GH 0 = 0. The two columns of H 0 are linearly independent and so we
know that the code with generating matrix G has distance 3 (using Theorem
3.1.2).
In the previous example it was easy to see that the two columns of H 0
were linearly independent but we could prove it formally by considering the
determinant of the submatrix constructed from any two rows of H 0 and showing
that any such submatrix has a non-zero determinant. In this example we have
that the general form of a submatrix is
· i ¸
β (β 2 )i
β j (β 2 )j
with determinant β i β 2j − β j β 2i = β i+2j + β 2i+j = β i+j (β i + β j ) since we are
working over GF [8].
The matrix above is closely related to a Vandermonde matrix. A Vander-
monde matrix of order s has the following form.
 
1 1 1 ... 1
 x1 x2 x3 ... xs 
 
 x21 x 2
x 2
. . . x 2 
 2 3 s 
 .. .. .. .. .. 
 . . . . . 
xs−1
1 x s−1
2 x s−1
3 . . . x s−1
s

An expression for the determinant can be found in general; here we give the
result over GF [2k ] only.
k
LEMMA 4.2.1. Let x1 , x2 , . . . , xs be non-zero elements of GF
Q[2 ]. Then the
determinant of the Vandermonde matrix of order s is given by i<j (xi + xj ).
Proof. We prove the result by induction. For s = 2 the result is clear (recalling
that we are working mod 2). For s = 3 we have
 
1 1 1
 x1 x2 x3  .
x21 x22 x23

c
°Debbie Street, 2011 34
Add x1 times row 2 to row 3 and x1 times row 1 to row 2 to get
   
1 1 1 1 1 1
 x1 + x1 x2 + x1 x3 + x1  =  0 x2 + x1 x3 + x1  .
2 2 2 2
x1 + x1 x2 + x2 x1 x3 + x3 x1 0 x2 (x2 + x1 ) x3 (x3 + x1 )

Thus  
1 1 1 · ¸
 1 1
det x 1 x2 x3  = (x2 + x1 )(x3 + x1 ) det .
x2 x3
x21 x22 x23
Thus the result holds for s = 3. To complete the proof we only need to observe
that in general we add x1 times row i to row i + 1. The details are left as an
exercise.

THEOREM 4.2.1.
Let g(x) = (β `+1 + x)(β `+2 + x) . . . (β `+d−1 + x) be the generator polynomial of
a linear cyclic code C over GF [2k ] of length n = 2k − 1, where β is the primitive
element of the field and ` is some integer. Then the corresponding generator
matrix is given by  
g(x)
 xg(x) 
 
 2
x g(x) 
G= 
 .. 
 . 
k
x2 −d−1
g(x)
and d(C) ≥ d.
Proof. The roots of g(x) are given by β `+1 , β `+2 , . . . , β `+d−1 . Construct the
matrix  
1 1 ... 1
 β `+1
β `+2
. . . β `+d−1 
 
 `+1 2
(β `+2 )2 (β `+d−1 )2 
H 0 =  (β ) ... .
 .. .. .. .. 
 . . . . 
(β `+1 )n−1 (β `+2 )n−1 . . . (β `+d−1 )n−1
This matrix satisfies GH 0 = 0 and has a non-zero determinant for any submatrix
of order d − 1. So d(C) ≥ d as required and H is the parity check matrix of
C.
The codes of this theorem are Reed-Solomon codes. They have n = 2k − 1,
m = 2k − d and distance d. So the Reed-Solomon code of Example 4.2.1 has
n = 23 − 1 = 7, m = 23 − 3 = 5 and distance d = 3.
While we can use the alphabet over GF [2k ], in practice we are using a binary
channel to transmit the codewords and so we use a binary representation of the
codewords. In this representation, each element of GF [2k ] is replaced by its
equivalent binary k-tuple. The next example illustrates this idea.
EXAMPLE 4.2.3.
Let g(x) = α2 + x over the field GF [4] (constructed using 1 + x + x2 so we have

c
°Debbie Street, 2011 35
been using α for the primitive element). Then n = 22 − 1 = 3, d = 1 + 1 = 2
and m = 4 − 2 = 2. Thus the generator matrix G is given by
· 2 ¸
α 1 0
G= .
0 α2 1

There are 16 possible messages (the ordered pairs of elements of GF [4]). To get
the corresponding codewords we evaluate mG over GF [4]. The 16 messages,
the corresponding codewords and the binary representations of these codewords
are given below.
m mG c m mG c
00 000 000000 α0 1α0 100100
01 0α2 1 001110 α1 111 101010
0α 01α 001001 αα 1α2 α 101101
0α2 0αα2 000111 αα2 10α2 100011
10 α2 10 111000 α2 0 αα2 0 011100
11 α2 α1 110110 α2 1 α01 010010
1α α2 0α 110001 α2 α ααα 010101
1α2 α 2 α2 α 2 111111 α2 α2 α1α2 011011

We have seen that the Reed-Solomon codes all have length 2k − 1 for some
k. Sometimes we want codes of a different length from this. A shortened Reed-
Solomon code of length n − t is obtained by taking all codewords with 0 in the
final t positions (of the representation over GF [2k ] of course, given the value of
n) and deleting those positions.
EXAMPLE 4.2.4.
Consider the Reed-Solomon code of Example 4.2.3. Suppose that we wanted to
have a code of length n = 2. Then we want to shorten the code by 1. So we
take the four codewords that have 0 in the final position, specifically 000, 1α0,
αα2 0 and α2 10. Deleting the final position gives the code

C = {00, 1α, αα2 , α2 1}.

4.2.1 Exercises
1. Complete the inductive proof of the determinant of the Vandermonde
matrix.
2. Let C be a Reed-Solomon code with g(x) = (1 + x)(α + x) over the field
GF [4].
(a) Find n, d and m for this code.
(b) Construct the generator matrix G for this code.
(c) Find all the codewords over GF [4] and give the corresponding binary
representations.
(d) Give the shortened code of length 2.

c
°Debbie Street, 2011 36
3. Let k = 3, ` = 1, d = 3 in Theorem 4.2.1. Assuming that GF [8] is given by
the irreducible polynomial 1 + x + x3 , give the generator matrix G for this
Reed-Solomon code. Give the binary representations of the codewords for
the messages (β, 0, 1, β, β 2 ) and (0, 0, β 5 , β 6 , 1).

4.3 Decoding the Reed-Solomon Code


In a binary code it is sufficient to know the location of an error to be able to
correct it. In any code over a larger alphabet, such as the Reed-Solomon codes,
we need to know both the location of any errors and also the size of the errors.
We will discuss ways of establishing both the location and the size of any errors
in this section.
But first we will see one way to decode the message if no errors have been
made in transmission.
EXAMPLE 4.3.1.
Consider the Reed-Solomon code of Example 4.2.1 Recall that the generating
matrix was given by
 3 
β β4 1 0 0 0 0
 0 β3 β4 1 0 0 0 
 

G= 0 0 β 3
β 4
1 0 0 .
 0 0 0 β3 β4 1 0 
0 0 0 0 β3 β4 1

Consider the message m = (m1 , m2 , m3 , m4 , m5 ). It becomes the codeword


c = mG and we see that the last entry of c is m5 . The second-last entry of c,
c6 = m4 + β 4 m5 and so we can find m4 . Proceeding in this way we can find m
given that c has been received without errors.
The next example illustrates the ideas of both location and magnitude of
the errors for the Reed-Solomon codes.
EXAMPLE 4.3.2.
Consider the Reed-Solomon code of Example 4.2.3. Suppose that we sent the
codeword c = (α2 , α, 1) and received the word v = (α2 , α, α2 ). Then the error
is e = (0, 0, α2 + 1) = (0, 0, α). The error appears in position α2 and is of size
α.
We know that the code of the previous example had distance 2 and so we
would not expect to be able to correct any errors (see Theorem 3.1.1). In the
next example we consider the smallest Reed-Solomon code of distance 3 and see
how a decoding strategy might be developed.
EXAMPLE 4.3.3.
Consider the Reed-Solomon code with k = 2, ` = 2, and d = 3. Since k = 2
we need to use GF [4] and we will use the field with irreducible polynomial
1 + x + x2 . As before we will let the primitive element be α. Then g(x) =
(α2+1 + x)(α2+2 + x) = (1 + x)(α + x) = α + (1 + α)x + x2 . Thus the generating
matrix is given by G = [α 1+α 1]. The message 0 corresponds to codeword 000,
the message 1 to codeword (α, 1+α, 1), the message α to codeword (α2 , α+α2 , α)
and the message α2 to the codeword (1, α2 + 1, α2 ).

c
°Debbie Street, 2011 37
We know that the roots of g(x) are 1 and α. To decode any received sequence
it is helpful to construct the parity check matrix. As in the proof of Theorem
4.2.1, we know that  
1 1
H0 =  1 α  .
1 α2
Suppose that we receive v = (α, 1 + α, 1 + α). We evaluate vH 0 = (α, 1).
Since this is not 0 we assume that 1 error has occurred (since d = 3 and so we
can correct 1 error). Assume that the size of the error is s1 and that it is in
position p1 . Thus we know that e = (s1 , 0, 0) (so the error is in position 1) or
(0, s1 , 0) (so the error is in position α) or (0, 0, s1 ) (so the error is in position
α2 ).
Evaluating eH 0 gives (s1 , s1 × p1 ). Equating this to vH 0 = (α, 1) we see that
s1 = α and p1 = α2 . Thus we assume that the error is given by (0, 0, α) which
would mean that the corrected received sequence would be

(α, 1 + α, 1 + α) + (0, 0, α) = (α, 1 + α, 1)

which is indeed a codeword.


So now we have some idea of how a decoding strategy is going to work. We
will assume that there are at most t errors, where d ≥ 2t + 1, and that these
errors are located in positions p1 , p2 , . . . , pt and have sizes s1 , s2 , . . . , st . We will
evaluate vH 0 and equate these values to the values that we expect from the
error vector. Then we will solve for the si and the pi .
We will work through another example before giving a general description
of the decoding algorithm.
EXAMPLE 4.3.4.
Once again we will use the field GF [23 ] generated from the irreducible cubic
1 + x + x3 . We will use β for the primitive element of the field. Then we have

GF [8] = {0, 1, β, β 2 , β 3 = β + 1, β 4 = β 2 + β, β 5 = β 2 + β + 1, β 6 = β 2 + 1}.

Let the generating polynomial for the code be

g(x) = (1 + x)(β + x)(β 2 + x)(β 3 + x).

This gives rise to a Reed-Solomon code with n = 7, m = 3 and d = 5. This


means that t = 2. We know that the parity check matrix is given by
 
1 1 1 1
 1 β β2 β3 
 
 1 β2 β4 β6 
 
H0 =  3
 1 β4 β
6
β2 .
 1 β β β 5 
 
 1 β5 β3 β 
1 β6 β5 β4

Suppose that we receive v = (β 6 , β, β 5 , β 2 , 1, 0, β 2 ). We calculate vH 0 =


(1, β 3 , β 3 , 1). As this is not 0 we assume that two errors have been made. We
further assume that the first error is in position p1 and is of size s1 and that

c
°Debbie Street, 2011 38
the second error is in position p2 and is of size s2 . That is, e has two non-
zero entries, in positions p1 and p2 , and these entries are equal to s1 and s2
respectively. So we want to evaluate these four unknowns.
We do this by evaluating eH 0 . This gives

(s1 + s2 , s1 × p1 + s2 × p2 , s1 × p21 + s2 × p22 , s1 × p31 + s2 × p32 ).

Thus we get the four equations

1 = s1 + s2
β3 = s1 × p1 + s2 × p2
β3 = s1 × p21 + s2 × p22
1 = s1 × p31 + s2 × p32 .

On the face of it these equations are going to be very difficult to solve. But
we notice that these equations are linear in the si s so if we could solve for the
pi s then we could substitute and solve for the si s.
To this end let’s construct a polynomial which has as its 0s p1 and p2 . So
we let θp (x) = (p1 + x)(p2 + x). Now we expand θp and get

p1 p2 + (p1 + p2 )x + x2 = θp (x) = θ0 + θ1 x + x2 .

Now we will multiply both sides of this equation by s1 to get

s1 θp (x) = s1 θ0 + s1 θ1 x + s1 x2 .

Substituting p1 for x gives

0 = s1 θ0 + s1 θ1 p1 + s1 p21 .

Now multiply both sides of the original equation by s2 to get

s2 θp (x) = s2 θ0 + s2 θ1 x + s2 x2 .

Substituting p2 for x gives

0 = s2 θ0 + s2 θ1 p2 + s2 p22 .

Adding we get

0 = (s1 + s2 )θ0 + (s1 p1 + s2 p2 )θ1 + s1 p21 + s2 p22 .

We realise, however, that we know some of these values and so we end up with

0 = θ0 + β 3 θ1 + β 3 .

Proceeding in the same way, multiplying this time by s1 p1 and s2 p2 and


substituting and adding, we get

0 = β 3 θ0 + β 3 θ1 + 1.

So now we have two equations in the two unknowns θ0 and θ1 . Solving we


have θ0 = 1 and θ1 = β 5 . This means that we know that θp (x) = 1 + β 5 x + x2 .
To find the values of p1 and p2 we substitute each of the field elements in turn
into θp . We get

c
°Debbie Street, 2011 39
x θp (x) x θp (x)
1 β5 β4 1 + β + β2
β 0 β5 1
β2 β4 β6 0
β3 β + β2
From the table we see that p1 = β and p2 = β 6 .
Substituting these values into the entries in eH 0 we get 1 = s1 + s2 and
β = βs1 + β 6 s2 . Solving we get s2 = β 2 and s1 = β 6 . So the most likely error
3

pattern is e = (0, β 6 , 0, 0, 0, 0, β 2 ). This leads to the codeword

v + e = (β 6 , β, β 5 , β 2 , 1, 0, β 2 ) + (0, β 6 , 0, 0, 0, 0, β 2 ) = (β 6 , β 5 , β 5 , β 2 , 1, 0, 0).

Now we generalise what we have just done in the previous two examples.
1. Take a received sequence v.
2. Evaluate vH 0 .
3. If vH 0 = 0, assume that no errors have been made. Decode the received
sequence.
4. Otherwise assume that at most t errors have been made, where d ≥ 2t + 1.
Further assume that the errors are located in positions p1 , p2 , . . . , pt with
sizes s1 , s2 , . . . , st . Construct a polynomial θp (x) with roots p1 , p2 , . . . , pt .
5. Expand θp (x). The coefficient of xi in this polynomial is θi .

6. Multiply both sides of θp (x) by si × pji , substitute x = pi and sum over i.


Do this for each j = ` + 1 to j = ` + t.
7. Solve the resulting equations for the θi and hence the pi .
8. Now solve the equations that arise from vH 0 for the si .
9. Calculate the most likely error vector, e.
10. Decode v + e.

4.3.1 Exercises
1. Consider the Reed-Solomon code with generator matrix G = [α 1 + α 1].
Give all the sequences at distance 1 from each codeword.
2. Consider the code of Example 4.3.4.
(a) Encode the message m = (1, β, β 3 ).
(b) Now assume that no errors have been made in transmission. Decode
the codeword.
(c) Suppose that you received the sequence (β 5 , β, β 3 , β, β 5 , β 4 , β 5 ). To
which codeword, and hence message, is it most likely to correspond?
(d) Suppose that you received the sequence (β, β 6 , β, β, β 5 , β 4 , β 4 ). To
which codeword, and hence message, is it most likely to correspond?

c
°Debbie Street, 2011 40
4.4 Burst Errors and Interleaving
To date we have been assuming that the errors that arise when sending an
encoded message arise independently. But sometimes this is not the case -
think about a scratch on the surface of a CD for example. A set of errors that
occur together are termed a burst of errors. We will say that the burst length of
an error is the number of digits from the first 1 in e to the last 1 in e.
EXAMPLE 4.4.1.
Let n = 8. The 8-sequence 01001100 has a burst length of 5.
Since we are now dealing with burst errors, our assumption that the most
likely error is the one of least weight no longer makes sense. Instead we now
take as coset leaders the words with the bursts of least length.
One method that can be used to improve the burst error correcting properties
of a code is to interleave several adjacent codewords before transmitting them.
Suppose that a code is to be interleaved to depth s before transmission. This
means that s codewords are selected, the first digits from each of the codewords
are transmitted, then the second digits from each of the codewords are trans-
mitted and so on until the final digits of the s codewords are transmitted. We
see that a burst error is more likely to damage all the digits in the same place
in each of the s codewords rather than damage every digit in one codeword.
EXAMPLE 4.4.2.
Suppose that we consider the 6 codewords c1 = 1000110, c2 = 0100101, c3 =
0010011, c4 = 0001111, c5 = 1100011 and c6 = 1010101 from the (7,4) Hamming
code. Suppose that we interleave the codewords to depth s = 3. So we would
interleave c1 , c2 and c3 and transmit these and then interleave c4 , c5 and c6 and
transmit these. Thus we would transmit 100 010 001 000 110 101 011 (spaces
inserted for convenience) and then we would transmit 011 010 001 100 101 110
111.
We can think about interleaving by thinking about writing the codewords
as the rows of a matrix. Then we transmit the columns of the matrix in turn.
The first three codewords of the previous example are shown in Table 4.2.

Table 4.2: Example of Depth 3 Interleaving

1 0 0 0 1 1 0
0 1 0 0 1 0 1
0 0 1 0 0 1 1

How does interleaving help increase with the burst error correcting proper-
ties? Suppose that C is ` burst error correcting. Suppose that we interleave C
to depth s. Then a burst of errors of length at most s` during transmission can
affect at most ` digits in a codeword. So as long at there is only one burst error
pattern affecting the codeword it can be corrected. We can summarise these
observations as follows.
THEOREM 4.4.1.
Suppose that C is an ` burst error correcting code and suppose that C is inter-

c
°Debbie Street, 2011 41
leaved to depth s. Then all bursts of length at most s` can be corrected, assuming
that each codeword is affected by at most one burst of errors.
Take a code that is 1 error correcting and interleave it to depth 3. Then the
interleaved code corrects all bursts of length 3.

4.4.1 Exercises
1. Consider the linear code with generating matrix
 
1 1 0 1 0 0
G =  0 0 1 0 1 0 .
1 0 0 1 1 1

Encode the messages 101, 011, 111, 010, 100 and 001. Now find the string
of digits transmitted when the code is interleaved to depth s = 1; s = 2;
s = 3.

4.5 Recording Music on CDs


CDs provide a familiar application of error-correcting codes.
CDs store information on long spirals about 500nm (nanometers) wide. The
spiral starts at the centre of the CD and spirals outwards, with 1600 nm sepa-
rating one part of the spiral from the next. The information is recorded in the
form of pits and lands. A pit is about 100nm deep and must be at least 850nm
long and no more than 3800nm long. The pits and lands are detected by a laser
beam. To stay on track there must be changes in the height of the track at least
every 3800nm. Each change in height, up or down, is recorded as a 1 and no
change in height is registered as a 0.
To record sound, the sound needs to be sampled at intervals and each sample
must be converted to the nearest available level from a set of possible levels. In
practice, the sampling happens 44,100 times per second and there are 65,536 =
216 possible levels to match the sound to. For stereo sound two measurements
are taken at each sampling point, one from the right channel and one from the
left channel.
The level of the sound is recorded as a binary string of length 16, viewed
as two strings of length 8. If we call 1/44100 of a second a tick then at each
tick two binary vectors of length 16 are recorded and these are each viewed as
two strings of length 8 giving m4t , m4t+1 , m4t+2 , m4t+3 from tick t. We call a
binary string of length 8 a byte.
To get a message the 4 bytes of 6 consecutive ticks are grouped together.
Thus we have messages made up of 24 bytes (that is, from an alphabet with 28
binary 8-strings). These messages are encoded using a (n = 28,m = 24,d = 5)
shortened Reed-Solomon code.
To protect against burst errors these codewords are 4-frame delay inter-
leaved, with 0s inserted in all otherwise empty positions. The columns of this
array are now treated as messages of length 28, again with an alphabet of bytes,
and a second shortened Reed-Solomon code with (n = 32,m = 28,d = 5) is used
to get the codewords. One further symbol is adjoined to these codewords for
control and display purposes so that at this stage each codeword has length 33.

c
°Debbie Street, 2011 42
The laser tracking device can not stay on track if there are no height changes
nor can it cope if there are lots of changes close together. Thus between any
two 1s (changes in height) there must be at least 2 0s and no more than 10 0s.
Obviously the 28 binary 8-sequences do not have this property, but it turns out
that there are 267 binary 14-sequences that do have this property. 256 of these
are chosen and mapped to the 256 alphabet symbols. This is termed eight to
fourteen modulation (EFM).
The following table shows part of the mapping used in EFM.

Table 4.3: Sample of EFM Mapping

8-sequence 14-sequence
01100101 00000000100010
01100110 01000000100100
01100111 00100100100010
01101000 01001001000010
01101001 10000001000010
01101010 10010001000010
01101011 10001001000010
01101100 01000001000010
01101101 00000001000010
01101110 00010001000010

We also want to have not too many 0s between these strings of length 14
and so we insert a further symbol equal to “1+final bit of left hand string+ final
bit of right hand string” between the strings that make up one codeword.
The final encoding is to adjoin a binary 27-sequence with information to
ensure the playback is synchronised. These 27-sequences also have no 1s too
close together or too far apart.
Thus by the end of the encoding we have taken 6 ticks, represented by
6 × 4 × 8 = 192 binary digits, and converted them to 588 binary digits.
To decode from the 588 binary digits: start by reversing the final few steps
of the encoding process. So remove the 27 bits which include the synchronising
information, remove the merging 14-string bits and reverse the look-up proce-
dure to get the strings of length 8. Hopefully these are indeed codewords in C2
from which all single errors can be corrected. If more than one error is detected
then all bytes in that word are flagged. Next the interleaving is removed and
C1 , which has distance 5, is used to correct up to 4 erasures, where flagged bytes
are treated as erasures.
How good is this decoding? Decoding in C2 goes wrong if the received word
is within distance 1 of the wrong codeword. There are (28 )28 = 2224 codewords
in C2 and one of them is right. There are 32(28 -1) words of length 32 at
distance 1 from each codeword. To be undetected, an error pattern must take
one codeword in C2 to another codeword in C2 or it must take a codeword in C2
to a word that is distance 1 from an incorrect codeword in C2 . There are (28 )32
binary error patterns possible and of these (2224 − 1)(1 + 32(28 − 1)) result in a
the wrong codeword or to a word distance 1 from the wrong codeword. So the

c
°Debbie Street, 2011 43
proportion of error patterns which are undetected is

(2224 − 1)(1 + 32(28 − 1))/(28 )32 ≈ 1/219 .

Thus we see that C2 can cope with small random errors from the manufacturing
process.
What about decoding using C1 ? Since C1 has distance 5, up to 4 flagged
digits can be corrected. To affect 5 digits in a C1 codeword requires that 17
columns would need to be affected. Thus 15 × 32 + 2 + 2 bytes (at a minimum)
would be affected. Since there 588 bits representing each column, all bursts of
length 15 × 588 + 3 × 17 = 8871 are decoded correctly. A burst of this length
represents about 2.5mm of track length on a CD.
In practice further interleaving is carried out and averaging of reliable sounds
can be used to replace sounds flagged as unreliable in the decoding process.
CD ROMS have even stronger error detection and correction capacity.

4.5.1 Exercises
1. Represent the sine wave by a 1-digit message; by a 2-digit message and by
a 3-digit message. (This is most easily done using Mathematica to draw
the sine wave and then superimposing the appropriate step function.)
2. Suppose that between any two 1s (changes in height) there must be at
least 1 0 and no more than 4 0s. Find 16 sequences of length 6 with these
properties.

4.6 References and Comments


The material in this topic closely parallels the presentation in Hankerson, Hoff-
man, Leonard, Lindner, Phelps, Rodger, and Wall (2000).
The introduction to finite fields is an edited version of material in Street and
Burgess (2007).
Table 4.3 is from Watkinson, J. (1988). The Art of Digital Audio. Focal
Press, London, first edition, quoted in “Introduction to Sound Recording” by
Geoff Martin available at
http://www.tonmeister.ca/main/textbook/node895.html
A good description on CD encoding with very nice diagrams and with dif-
ferent statements about the merging channel bits may be found at
http://www.laesieworks.com/digicom/Storage CD.html

c
°Debbie Street, 2011 44
Topic 5

Number Theory and the


RSA Cryptosystem

In this topic we extend our results on modular arithmetic to enable us to do


modular exponentiation efficiently. This is useful because modular exponentia-
tion is at the heart of one of the first public key encryption systems, the RSA
cryptosystem.

5.1 Number Theory


The RSA algorithm is based on three ideas from elementary number theory:
modular arithmetic, the Euclidean algorithm and Euler’s φ function.
The RSA algorithm uses properties of the powers mod n. The idea is for
each person j to find two distinct large primes, p and q say, calculate n = pq
and find a pair of numbers d and e such that de ≡ 1 mod ((p − 1)(q − 1)). Then
the public encryption function is (n, e) so each ciphertext that person j receives
is c = me mod n. It turns out that m = cd mod n and this is what we will
prove in this section.
The first step is to be able to find pairs of numbers d and e such that
de ≡ 1 mod ((p − 1)(q − 1)) for some p and q.
Recall from the results on affine ciphers that a pair of numbers such that
de ≡ 1 mod 26 were said to be multiplicative inverses mod 26 and that we
showed that d had a multiplicative inverse only if gcd(d, 26) = 1. We said at
the time that this was a general result and this is what we now establish.
Recall that the greatest common divisor of a and b is the largest number that
divides both a and b. We denote this number by gcd(a, b). If the gcd(a, b) = 1
then we say that a and b are relatively prime.
We can find the gcd(a, b) easily if we can calculate the prime factorisations
of a and b. The next example shows how this works.

45
EXAMPLE 5.1.1.
Suppose that a = 60 and b = 50. Then the prime factorisation of a is 60 =
22 × 3 × 5 and of b is 50 = 2 × 52 . So gcd(50, 60) = 2 × 5 = 10.
If it is not easy to find the prime factorisation of a and b, usually because
they are large numbers, then the Euclidean algorithm can be used to find the
gcd. We will start with an example.
EXAMPLE 5.1.2.
Consider 126 and 95. Then

126 = 95 × 1 + 31
95 = 31 × 3 + 2
31 = 2 × 15 + 1
2 = 1×2+0

So the divisor in each line becomes the subject of the next line and the
remainder in one line becomes the divisor in the next. The last non-zero re-
mainder is the gcd so gcd(126, 95) = 1 (which is clearly true since 95 = 19 × 5
and 126 = 2 × 7 × 9).
You can also use this algorithm to write the gcd in terms of the original
numbers. So

1 = 31 − 2 × 15
= 31 − (95 − 31 × 3) × 15
= 31 − 95 × 15 + 31 × 45
= 31 × 46 − 95 × 15
= (126 − 95) × 46 − 95 × 15
= 126 × 46 − 95 × 61

Thus we know that 95 × −61 ≡ 1 mod 126 so −61 = 65 (mod 126) is the
multiplicative inverse of 95 mod 126, for instance.

Here is a formal description of the Euclidean algorithm. Without loss


of generality assume that a > b. Divide a by b and write

a = q1 b + r 1 .

If r1 = 0 then gcd(a, b) = b. If r1 6= 0 then write

b = q2 r1 + r2 .

If r2 = 0 then r1 |b and a = q1 b + r1 = q1 q2 r1 + r1 and so gcd(a, b) = r1 . If


r2 6= 0 then write
r1 = q3 r2 + r3
and continue until
rk−1 = qk+1 rk
when we conclude that
gcd(a, b) = rk .

c
°Debbie Street, 2011 46
THEOREM 5.1.1.
Suppose that gcd(a, n) = 1. Let s and t be integers such that as + nt = 1. Then
since as ≡ 1 mod n, s is the multiplicative inverse of a mod n.
EXAMPLE 5.1.3.
Find the multiplicative inverse of 21 mod 26. First we observe that gcd(21, 26) =
1 and so we know that 21 has a multiplicative inverse mod 26. Next we use the
Euclidean algorithm to find the inverse. Proceeding as above we have

26 = 21 × 1 + 5
21 = 5×4+1
5 = 5×1

which confirms that the gcd(21, 26) = 1 and gives us the information that we
need to calculate the multiplicative inverse. We do that by expressing 1 as a
linear combination of 26 and 21.

1 = 21 − 5 × 4
= 21 − (26 − 21) × 4
= 21 × 5 − 26 × 4

and so we see that 5 is the multiplicative inverse of 21 mod 26.

Now we consider a very small example of the RSA system.


EXAMPLE 5.1.4.
Let p = 5 and q = 7. Then n = pq = 35 and (p − 1)(q − 1) = (7 − 1)(5 − 1) = 24.
Now we need to find a pair of numbers d and e such that de ≡ 1 mod 24.
Suppose we let e = 11 (which will have a multiplicative inverse mod 24 since
gcd(11, 24) = 1). We use the Euclidean algorithm to find d.

24 = 11 × 2 + 2
11 = 2×5+1

so we see that

1 = 11 − 5 × 2 = 11 − 5(24 − 11 × 2) = 11 × 11 − 24 × 5

so d = 11. (In this case the same power is used to encrypt and decrypt and you
can show that this is true for every invertible value mod 24. This is an unusual
situation.)
Now suppose that we want to send the message m = 23. We calculate
me mod 35 = 2311 mod 35 = 32 = c. When c is received calculate cd mod 35 =
3211 mod 35 = 23 = m.
We see that we need to be able to calculate modular exponentiation effi-
ciently and we need to establish Euler’s Theorem (on which RSA is based).
We make use of the fact that if a ≡ b mod n then ah ≡ bh mod n. Thus
we can calculate the exponents of numbers mod n without having to do the
exponentiation of more than a few terms. The following example gives the idea.

c
°Debbie Street, 2011 47
EXAMPLE 5.1.5.
What is the final digit of 356 ? Thus we want to know the value of 356 mod 10.
We observe that 32 ≡ 9 mod 10, 33 ≡ 9 × 3 ≡ 7 mod 10, 34 ≡ 7 × 3 ≡ 1 mod 10
and 35 ≡ 1 × 3 ≡ 3 mod 10. Now 56 = 4 × 14 so 356 ≡ (34 )14 ≡ 1 mod 10.
We need to use the ideas in the previous example since otherwise the com-
putations would tax the computer’s memory. For instance if a, b and n are 100
digit numbers then ab has more than 10100 digits and the computer’s memory
would overflow. By using the ideas in the previous example the computation of
ab mod n can be achieved in at most 700 steps and no number will have more
than 200 digits.
The next results give us a tool for finding suitable values of e, d and n.
THEOREM 5.1.2.
Fermat’s Little Theorem Suppose that p is a prime and that gcd(a, p) = 1.
Then
ap−1 ≡ 1 mod p.
Proof. Let
S = {1, 2, . . . , p − 1}.
Define ψ : S → S by ψ(x) = ax mod p. For each x ∈ S ψ(x) ∈ S since if
ψ(x) ≡ 0 mod p then p|ax. Since gcd(a, p) = 1 that would require that p|x
which is a contradiction since x ∈ S. Thus ψ(x) ∈ S. Suppose that x, y ∈ S
and ψ(x) = ψ(y). Thus ax ≡ ay mod p from which we deduce that x ≡ y mod
p. So if x and y are distinct elements of S then ψ(x) and ψ(y) are distinct. So
ψ(1), ψ(2), . . . , ψ(p − 1) are the elements of S in some order. Thus we have that

1 × 2 × . . . × (p − 1) ≡ ψ(1) × ψ(2) × . . . × ψ(p − 1)


≡ (a)(2a)(3a) . . . (a(p − 1))
≡ ap−1 (1 × 2 × . . . × (p − 1)) mod p.

Since all the entries in S are co-prime to p we can divide by each of them to get

1 ≡ ap−1 mod p.

EXAMPLE 5.1.6.
Let p = 7. Then x6 ≡ 1 mod 7 for 2 ≤ x ≤ 6.

So clearly we do not want to use a prime value of n in the RSA algorithm.


We now give Euler’s theorem, an extension of Fermat’s Little Theorem that
works for composite values of n. First we need to define Euler’s φ function. For
an integer n φ(n) is defined to be the number of integers a such that 1 ≤ a ≤ n
and gcd(a, n) = 1. So φ(n) is the number of integers smaller than n and co-
prime to n. Thus φ(p) = p − 1 for any prime p. If n = pr then φ(n) = pr − pr−1
since all the multiples of p that are smaller than pr must be excluded.
Here is a more general result.
THEOREM 5.1.3.
If a and b are relatively prime then φ(ab) = φ(a)φ(b).

c
°Debbie Street, 2011 48
Proof. Consider the integers between 1 and ab. Partition them into a sets
Si = {i, a + i, . . . , (b − 1)a + i}, 1 ≤ i ≤ a. For each of these sets, either all of the
numbers are relatively prime to a or none of the numbers are relatively prime
to a. Thus there are φ(a) sets that have all entries relatively prime to a.
How many of these entries are also co-prime to b? There are b entries in each
set and no two are congruent mod b (since ka + i ≡ ja + i mod b implies that
b|(k − j) which is a contradiction). So each set contains each of the congruence
classes mod b and so contains φ(b) entries co-prime to b. The result follows.
EXAMPLE 5.1.7.
Let a = 3 and b = 4. Then S1 = {1, 4, 7, 10}, S2 = {2, 5, 8, 11} and S3 =
{3, 6, 9, 12}. The sets S1 and S2 contain the integers less than ab = 12 that are
co-prime to 3. Now in S1 the two entries 1 and 7 are co-prime to 4 and in S2
the two entries 5 and 11 are co-prime to 4. So there are 2 × 2 = 4 numbers less
than 12 that are co-prime to 12 and so φ(12) = 4.
THEOREM 5.1.4.
Euler’s Theorem If gcd(a, n) = 1 then
aφ(n) ≡ 1 mod n.
Proof. This is similar to the proof of Fermat’s Little Theorem except that here
we partition the integers that are co-prime to n.
In the next section we show how to use these results to get d, e and n for
RSA.

5.1.1 Exercises
1. Use the Euclidean algorithm to find the multiplicative inverse of 21 mod
29.
2. Use the Euclidean algorithm to find the multiplicative inverse of 11 mod
39.
3. What is the final digit of 256 ?
4. What is the value of 510 mod 11?
5. Verify that φ(42) = φ(6)φ(7) = φ(2)φ(3)φ(7).
6. What is 11φ(42) mod 42?
7. The aim of this exercise is to prove that the Euclidean algorithm computes
the gcd. We let a, b, qi and ri be as in the notes.
(a) Let d be a common divisor of a, b. Show that d|r1 and use this to
show that d|r2 .
(b) Use induction to show that d|ri for all i. In particular d|rk , the last
non-zero remainder.
(c) Use induction to show that rk |ri for 1 ≤ i ≤ k.
(d) Use the fact that rk |r2 and rk |r1 to show that rk |b and hence rk |a.
Thus rk is a common divisor of a and b.
(e) Since d|rk we have that rk ≥ d and therefore rk is the greatest com-
mon divisor.

c
°Debbie Street, 2011 49
5.2 RSA Cryptosystem
The RSA cryptosystem was proposed by Rivest, Shamir and Adleman, after
whom it is named, in 1977. It is an example of a public key cryptosystem,
a concept first suggested by Diffie and Hellman in 1976. In 1997 documents
released under Freedom of Information in Britain showed that James Ellis had
discovered public key cryptography in 1970 and a variant of RSA had been
found in 1973 by Clifford Cocks.
The idea behind public key cryptography is very simple. Each person in a
group all of whose members want to be able to communicate securely choose
a public key which they publish. This is a function for encrypting a message.
Let’s call it Ej for the j th person. Person j also has a secret decryption function
Dj .
So if we want to send message m to person j we look up the list and find
Ej . We then send Ej (m). When person j receives Ej (m) they apply Dj and
get Dj (Ej (m)) = m.
As well as discussing the RSA algorithm we will discuss how easy it is to
recover a message when Dj is not known. We will also briefly discuss ways of
producing a secure cryptosystem which reduces the amount of information that
can be deduced about m merely from seeing the ciphertext c.

5.2.1 Choosing d, e and n


Choose two distinct large primes, say p and q. Let n = pq. Choose an e such
that gcd(e, φ(n)) = 1. Then find the d such that de ≡ 1 mod φ(n). We release e
and n. Suppose that the message is m where m < n. (If not break m into blocks
that have this property.) Then compute c = me mod n. To recover the message
observe that cd ≡ med ≡ m mod n since ed = 1 + kφ(n) by construction. While
this assumes that gcd(m, n) = 1 there is an exercise that shows that in fact we
don’t need this to be true to be able to decrypt the ciphertext.
It is best if p and q are chosen independently of each other and have of the
order of 100 digits. The lengths should be slightly different (to make it harder
to factor n) and (again for protection from attack) it is better if neither p − 1
nor q − 1 have only “small” prime factors.
To make sure that d exists it is often easiest to let e be a moderately large
prime. Small values for e can be used to help factorise n.
Remember that everyone knows e and n. If we could factor n then we could
recover d. Here is an equivalent problem.
If we know n and φ(n) then we can find p and q. Observe that n−φ(n)+1 =
pq − (p − 1)(q − 1) + 1 = p + q. So we know pq and p + q. Think about the
quadratic polynomial (x − p)(x − q). It has roots p and q but it can also be
written as x2 − (n − φ(n) + 1)x + n which can be solved using the quadratic
formula and so yields p and q.

5.2.2 Where is RSA used?


RSA is not really fast enough to use to send lots of data but it is a good way
of sending a key to be used for DES (Data Encryption Standard) which is then
used to send the large amounts of data.

c
°Debbie Street, 2011 50
It can also be used to send a signature, say, as part of a longer message to
allow for verification. First Alice encrypts her signature using her private key
d. Then she encrypts the whole message, including her encrypted signature,
using Bob’s public key. When Bob gets the message he decrypts it using his
private key. He then decrypts the signature using Alice’s public key - if it makes
sense then the whole message must have come from Alice (or Eve has been very
tricky).

5.2.3 Attacks on RSA


This section is really a quick overview of some ideas that you should be aware of
before trying to implement RSA to use in the “real world”. See Boneh (1999),
which can be found at crypto.stanford.edu∼dabo/papers/RSA-survery.pdf for
more details.
Suppose that n has g digits and let d be the decryption exponent. If an
attacker has the last g/4 digits of d then they can efficiently find d in time that
is linear in e log2 (e). So if e is small and we have a large part of d then it is
quick to find the rest of d. But if e is large then this result is still no better
than a case-by-case search.
It is tempting to choose small values of d so that messages can be decrypted
quickly. However if d < n1/4 /3 then d can be calculated in time polynomial in
log(n).
Transmitting short plaintext can be a problem. For example suppose that a
56-bit key is to be transmitted for use in DES. This key is the message m and is
about 1017 . This is encrypted to give c ≡ me mod n and is likely to have about
200 digits. So Eve calculates cx−e mod n for all 1 ≤ x ≤ 109 and y e mod n for
all 1 ≤ y ≤ 109 . If she gets a match then cx−e ≡ y e and so c ≡ (xy)e mod n
and so m ≡ xy mod n. This attack only works when m is the product of two
integers x and y both less than 109 but in those cases it does work.
This problem is overcome by padding m with random digits at both the
beginning and end to make a larger number initially. A more sophisticated
approach is given by optimal asymmetric encryption padding; see Trappe and
Washington (2006) for more information.
A timing attack works by observing how long it takes Bob to decrypt various
ciphertexts. Similar attacks can be based on looking at the power consumed
by the decryption process. Again look at Trappe and Washington (2006) for
further details.
Other attacks are based on factorising n and it is instructive to see how
quickly computing power has improved.
When RSA was released in 1977 an RSA challenge (now called the RSA-129
challenge) was created.
Given

n = 1143816257578888676692357799761466120102182
9672124236256256184293570693524573389783059
7123563958705058989075147599290026879543541,

c
°Debbie Street, 2011 51
e = 9007 and

c = 9686961375462206147714092225435588290575999
1124574319874695120930816298225145708356931
476622883989628013391990551829945157815154

find the corresponding plaintext.


In 1977 it was estimated this would take of the order of 4 × 1016 years. It
was finally solved in 1994 by Atkins, Graff, Lenstra and Leyland.
The plaintext is “THE MAGIC WORDS ARE SQUEAMISH OSSIFRAGE”.
To quote from Atkins et al:
To find the factorization of RSA-129, we used the double large
prime variation of the multiple polynomial quadratic sieve factor-
ing method. The sieving step took approximately 5000 mips years,
and was carried out in 8 months by about 600 volunteers from more
than 20 countries, on all continents except Antarctica. Combining
the partial relations produced a sparse matrix of 569466 rows and
524338 columns. This matrix was reduced to a dense matrix of
188614 rows and 188160 columns using structured Gaussian elimi-
nation. Ordinary Gaussian elimination on this matrix, consisting of
35489610240 bits (4.13 gigabyte), took 45 hours on a 16K MasPar
MP-1 massively parallel computer. The first three dependencies all
turned out to be ‘unlucky’ and produced the trivial factor RSA-129.
The fourth dependency produced the above factorization.

A history of the RSA factor challenge and the numbers involved at each
stage may be found at http://www.rsa.com/rsalabs/node.asp?id=2093.

5.2.4 Exercises
1. (Can be done by hand) Start with p = 5 and q = 11. Compute a set of
private and public keys. Encipher 436. Now decipher it.
2. To get an idea how hard it is to “crack” the RSA algorithm, use your key
from Question 1 to encipher a short word (say 2 to 5 letters long, using
a=1, b=2, ..., z=26) and give it to your partner. Now try to decipher your
partner’s message knowing only e and n. (For an example of this size it
should be possible to find all possible d and check manually.)
3. Check your calculations in Questions 1 and 2 using Mathematica.
4. Work through the Mathematica code given, choosing a couple of large
primes and encoding your own message.
5. The ciphertext “75” was obtained using RSA with n = 437 and e = 3.
You know that the plaintext was either “8” or “9”. Determine which it is
without factorising n.
6. Suppose that you encipher messages by calculating c = m3 mod 101. How
do you decipher?

c
°Debbie Street, 2011 52
7. Let p be a large prime. Suppose that you encipher by calculating c = me
mod p for some suitably chosen exponent e. How do you find a corre-
sponding decipherment exponent d?
8. Let n be the product of two large primes. Alice wants to send a message
m to Bob where gcd(m, n) = 1. Alice and Bob choose integers a and b
that are each relatively prime to φ(n). Alice computes c = ma mod n and
sends c to Bob. Bob computes d = cb mod n and send d to Alice. Since
Alice knows a she finds a1 such that aa1 ≡ 1 mod φ(n). She computes
f = da1 mod n and sends f to Bob. What does Bob have to do to obtain
m (and show that this works)? (The prime factors of n do not need to be
kept confidential in this scheme.)
9. Let n = pq be the product of two distinct primes.

(a) Let m be a multiple of φ(n). Show that if gcd(a, n) = 1 then am ≡ 1


mod p and mod q.
(b) Suppose that m is as above and let a be arbitrary (so gcd(a, n) 6= 1
is possible). Show that am+1 ≡ a mod p and mod q.
(c) Let e and d be the encipherment and decipherment exponents for
RSA with modulus n. Show that aed ≡ a mod n∀a. (This shows
that we do not need to assume gcd(a, n) = 1 to use RSA.)
(d) If p and q are large, why is it likely that gcd(a, n) = 1 for a randomly
chosen a?

5.3 References and Comments


Trappe and Washington (2006) have a nice account of the RSA system,
including both the number theory related to it and a discussion of some
of the number-theoretic attacks that are possible. Boneh (1999), which
can be found at crypto.stanford.edu∼dabo/papers/RSA-survery.pdf, has
more details about attacking the RSA cryptosystem and a criticism of the
“textbook” RSA.

c
°Debbie Street, 2011 53

You might also like