You are on page 1of 26

Principles Of Digital Communications A Top-Down Approach

Bixio Rimoldi School of Computer and Communication Sciences Ecole Polytechnique F ed erale de Lausanne (EPFL) Switzerland

c 2000 Bixio Rimoldi

Version January 7, 2013

ii

Contents
Preface 1 Introduction and Objectives 1.1 1.2 1.3 1.4 1.5 The Big Picture through the OSI Layering Model . . . . . . . . . . . . . . . . The Bit as the Universal Information Currency . . . . . . . . . . . . . . . . . Problem Formulation and Preview . . . . . . . . . . . . . . . . . . . . . . . . vii 1 1 5 8

Digital vs Analog Communication . . . . . . . . . . . . . . . . . . . . . . . . 13 A Few Anecdotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 17

2 Receiver Design for Discrete-Time Observations 2.1 2.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.1 2.2.2 Binary Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . 22 m -ary Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . 24

2.3 2.4

The Q Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Receiver Design for Discrete-Time AWGN Channels 2.4.1 2.4.2 2.4.3 . . . . . . . . . . . . . . 26

Binary Decision for Scalar Observations . . . . . . . . . . . . . . . . . 27 Binary Decision for n -Tuple Observations . . . . . . . . . . . . . . . 29

m -ary Decision for n -Tuple Observations . . . . . . . . . . . . . . . 32

2.5 2.6

Irrelevance and Sucient Statistic . . . . . . . . . . . . . . . . . . . . . . . . 34 Error Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.1 Union Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

iii

iv 2.6.2 2.7

CONTENTS Union Bhattacharyya Bound . . . . . . . . . . . . . . . . . . . . . . . 41

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.A Facts About Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.B Densities after One-To-One Dierentiable Transformations . . . . . . . . . . . 50 2.C Gaussian Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.D A Fact About Triangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.E Spaces: Vector; Inner Product; Signal . . . . . . . . . . . . . . . . . . . . . . 57 2.E.1 2.E.2 2.E.3 Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Inner Product Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Signal Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

2.F Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3 Receiver Design for the Waveform AWGN Channel 3.1 3.2 91

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Gaussian Processes and White Gaussian Noise . . . . . . . . . . . . . . . . . 93 3.2.1 3.2.2 Dirac-Delta-Based Denition of White Gaussian Noise . . . . . . . . . 93 Observation-Based Denition of White Gaussian Noise . . . . . . . . . 94

3.3 3.4

Observables and Sucient Statistic . . . . . . . . . . . . . . . . . . . . . . . 97 Transmitter and Receiver Architecture . . . . . . . . . . . . . . . . . . . . . . 100 3.4.1 Alternative Receiver Structures . . . . . . . . . . . . . . . . . . . . . 103

3.5 3.6

Continuous-Time Channels Revisited . . . . . . . . . . . . . . . . . . . . . . 108 Summary and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

3.A Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4 Signal Design Trade-Os 4.1 4.2 4.3 4.4 4.5 121

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Design Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Bandwidth Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Isometric Transformations Applied to the Codebook . . . . . . . . . . . . . . 124 The Energy-Minimizing Translation . . . . . . . . . . . . . . . . . . . . . . . 126

CONTENTS 4.6 4.7 4.8

Isometric Transformations Applied to the Waveform Set . . . . . . . . . . . . 128 Time Bandwidth Product versus Dimensionality . . . . . . . . . . . . . . . . 128 Building Intuition about Scalability: n versus k . . . . . . . . . . . . . . . . 132 4.8.1 4.8.2 4.8.3 4.8.4 Keeping n Fixed as k Grows . . . . . . . . . . . . . . . . . . . . . . 132 Growing n Linearly with k . . . . . . . . . . . . . . . . . . . . . . . 134 Growing n Exponentially With k . . . . . . . . . . . . . . . . . . . . 136 Bit-By-Bit Versus Block-Orthogonal . . . . . . . . . . . . . . . . . . . 139

4.9

Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5 Nyquist Signalling 5.1 5.2 5.3 5.4 5.5 153

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 The Ideal Lowpass Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 Nyquist Criterion for Orthonormal Bases . . . . . . . . . . . . . . . . . . . . 161 Symbol Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 5.5.1 5.5.2 Maximum Likelihood Approach . . . . . . . . . . . . . . . . . . . . . 166 Delay Locked Loop Approach . . . . . . . . . . . . . . . . . . . . . . 167

5.6

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

5.A L1 , L2 , and Lebesgue Integral: A Primer . . . . . . . . . . . . . . . . . . . 171 5.B Fourier Transform: a Review . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.C Fourier Series: a Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 5.D Proof of the Sampling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 179 5.E Square-Root Raised-Cosine Pulse . . . . . . . . . . . . . . . . . . . . . . . . 180 5.F The Picket Fence Miracle . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 5.G Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 6 Convolutional Coding and Viterbi Decoding 6.1 6.2 195

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 The Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

vi 6.3 6.4

CONTENTS The Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Bit Error Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 6.4.1 6.4.2 6.5 Counting Detours . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 Upper Bound to Pb . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

6.A Formal Denition of the Viterbi Algorithm . . . . . . . . . . . . . . . . . . . 212 6.B Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 7 Passband Communication via Up/Down Conversion 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 225

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 The Baseband-Equivalent of a Passband Signal . . . . . . . . . . . . . . . . . 228 Analog Amplitude Modulations: DSB, AM, SSB, QAM . . . . . . . . . . . . 233

Receiver for Passband Communication over the AWGN Channel . . . . . . . . 236 Baseband-Equivalent Channel Model . . . . . . . . . . . . . . . . . . . . . . 240 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Noncoherent Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

7.A Relationship Between Real and Complex-Valued Operations . . . . . . . . . . 252 7.B Complex-Valued Random Vectors . . . . . . . . . . . . . . . . . . . . . . . . 254 7.B.1 General Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . 254 7.B.2 The Gaussian Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 7.B.3 The Circularly Symmetric Gaussian Case . . . . . . . . . . . . . . . . 256 7.C Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263 Notation and Symbols 275

Preface
This text is intended for a one-semester course on the foundations of digital communication. It assumes that the reader has basic knowledge of linear algebra, probability theory, and signal processing, and has the mathematical maturity that is expected from a third-year engineering student. The text has evolved out of lecture notes that I have written for EPFL students. The rst pass of my notes greatly proted from three excellent sources, namely the book Principles of Communication Engineering by Wozencraft and Jacobs [1], the lecture notes written by ETH Prof. J. Massey for his course Applied Digital Information Theory, and the lecture notes written by Profs. B. Gallager and A. Lapidoth for their MIT course Introduction to Digital Communication. Through the years the notes have evolved and although the inuence of these sources might still be recognizable, the text has now its own personality in terms of content, style, and organization. The content is what I can cover in a one-semester course at EPFL. The focus is the transmission problem. By staying focused on the transmission problem (rather than also covering the source digitalization and compression problem), I have just the right content and amount of material for the goals that I deem most important, specically: (1) cover to a reasonable depth the most central topic of digital communication; (2) have enough material to do justice to the beautiful and exciting area of digital communication; and (3) provide evidence that linear algebra, probability theory, calculus, and Fourier analysis are in the curriculum of our students for good reasons. Regarding this last point, the area of digital communication is an ideal showcase for the power of mathematics in solving engineering problems. Of course the problems of digitizing and compressing a source are also important, but covering the former requires a digression into signal processing to acquire the necessary technical background, and the results are less surprising than those related to the transmission problem (which can be tackled right away). The latter is covered in all information theory courses and rightfully so. A more detailed account of the content is given below, where I discuss the text organization. In terms of style, I have paid due attention to proofs. The value of a rigorous proof goes beyond the scientic need of proving that a statement is indeed true. From a proof we can gain much insight. Once we see the proof of a theorem, we should be able to tell why the conditions (if any) imposed in the statement are necessary and what can happen if they are violated. Proofs are also important because the statements we nd in theorems and vii

viii the like are often not in the exact form needed for a particular application. Therefore, we might have to adapt the statement and the proof as needed. However, this text is written for people with the mathematical background of an engineer, which means that I cannot assume Lebesgue integration. Lebesgue integration is needed to introduce the space of L2 functions, which in turn is needed for a rigorous statement of the sampling theorem and of Nyquists criterion (Chapter 5). The compromise I make is to introduce these terms informally in Appendix 5.A. I think that an instructor should not miss the opportunity to share useful tricks. One of my favorites is the trick I learned from Prof. Donald Snyder (Washington University) on how to label the Fourier transform of a rectangle. (Most students remember that it is a sinc but tend to forget how to determine its height and width. See Appendix 5.B.) I do not use the Dirac delta function except for alternative derivations and illustrative examples. The Dirac delta function is widely used in communication books to introduce white noise and to prove an imprecise formulation of the sampling theorem. The Dirac delta function is a generalized function and we do not have the background to use it rigorously. Furthermore, students nd themselves on shaky ground when something goes wrong in a derivation that contains the Dirac delta function. Fortunately we can avoid Dirac deltas. In introducing white noise (Section 3.2) we avoid the use of the Dirac delta function by modeling not the white noise itself but the eect that white noise has on measurements. In Appendix 5.C we prove the sampling theorem via Fourier series. The remainder of this preface is about organization. There are various ways to organize the discussion around the diagram of Figure 1.3. The approach I have chosen is top-down with successive renements. It is top-down in the sense that we begin by considering the communication problem seen by the encoder/decoder pair of Figure 1.3 and move down in the diagram as we go, each time considering a more realistic channel model. It contains successive renements in the sense that we pass a second time over certain blocks: the focus of the rst pass is on where to do what, whereas in the second pass we concentrate on how to do it in a cost and computationally eective manner. The renements will concern the top two layers of Figure 1.3. In Chapter 2 we acquaint ourselves with the receiver design problem for channels that have a discrete output alphabet. In doing so, we hide all but the most essential aspect of a channel, specically that the input and the output are related stochastically. Starting this way takes us very quickly to the heart of digital communication the decision rule implemented by a decoder that minimizes the error probability. The decision problem is an excellent place to begin as the problem is new to students, it has a clean-cut formulation, the solution is elegant and intuitive, and and the topic is central to digital communication. After a rather general start, the communication problem is specialized for the discretetime AWGN (additive white Gaussian noise) channel, which plays a key role in subsequent chapters. In Chapter 2 we also learn how to determine (or upper-bound) the probability of error and we develop the notion of sucient statistic, needed in the following chapter. The appendices provide a review of relevant background material on matrices, on how

ix to obtain the probability density function of a variable dened in terms of another, on Gaussian random vectors, and on inner product spaces. The chapter contains a rather large collection of homework problems. In Chapter 3 we make an important transition concerning the channel used to communicate, specically from the rather abstract discrete-time channel to the realistic continuoustime AWGN channel. The objective will remain the same, i.e., develop the receiver structure that minimizes the error probability. The theory of inner product spaces, as well as the notion of sucient statistic developed in the previous chapter, give us the tools needed to make the transition elegantly and swiftly. We discover that the decomposition of the transmitter and the receiver, as done in the top two layers of Figure 1.3, is general and natural for the continuous-time AWGN channel. Up until Chapter 4 we assume that the transmitter has been given to us. In Chapter 4 we prepare the ground for the signal-design problem. We introduce the design parameters that we care about, namely transmission rate, delay, bandwidth, average transmitted energy, and error probability, and we discuss how they relate to one another. We introduce the notion of isometry to change the signal constellation without aecting the error probability: It can be applied to the encoder to minimize the average energy without aecting the other system parameters such as transmission rate, delay, bandwidth, error probability; alternatively, it can be applied to the waveform former to vary the signals time/frequency features. The chapter ends with three case studies aimed at developing intuition. In each case, we x a signaling family parameterized by the number of bits conveyed by a signal and determine the probability of error as the number of bits grows to innity. For one family, the dimensionality of the signal space stays xed and the conclusion is that the error probability goes to 1 as the number of bits increases. For another family, we let the signal space dimensionality grow exponentially and we will see that in so doing we can make the error probability become exponentially small. Both of these two cases are instructive but have drawbacks that make them unworkable approaches. From the case studies, the reasonable choice seems to be the middle-ground solution that consists in letting the dimensionality grow linearly with the number of bits. In Chapter 5 we do our rst renement that consists in zooming into the waveform former and the n -tuple former of Figure 1.3. We pursue a design strategy based on the lessons learned in the previous chapter. We are now in the realm of signaling based on Nyquists criterion. We also learn how to compute the signals power spectral density. In this chapter we also take the opportunity to underline the fact that the sampling theorem, the Nyquist criterion, the Fourier transform and the Fourier series are applications of the same idea: signals of an appropriately dened space can be synthesized via linear combinations of a set of vectors that form an orthogonal basis. The second renement takes place in Chapter 6 where we zoom in on the encoder/decoder pair. The idea is to expose the reader to a widely-used way of encoding and decoding. Because there are several coding techniques suciently many to justify a dedicated oneor two-semester course we approach the subject by means of a case study based on convolutional coding. The minimum error probability decoder incorporates the Viterbi

x algorithm. The content of this chapter was selected as an introduction to coding and to introduce the reader to elegant and powerful tools, such as the previously mentioned Viterbi algorithm and the tools to assess the resulting bit-error probability, notably detour ow graphs and generating functions. Chapter 7 introduces the layer that deals with passband AWGN channels. A nal note to the instructor who might consider taking a bottom-up approach with respect to Figure 1.3: Specically, one could start with the passband AWGN channel model and, as the rst step in the development, reduce it to the baseband model by means of the up/down converter. In this case the natural second step is to reduce the baseband channel to the discrete-time channel and only then address the communication problem across the discrete-time channel. I nd such an approach to be pedagogically less appealing as it puts the communication problem last rather than rst. As formulated by Claude Shannon, the father of modern digital communication, The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. This is indeed the problem that we address in Chapter 2. It should be said that beginning with the communication problem across the discrete-time channel, as we do, requires that students accept the discrete-time channel as a channel model worthy of consideration. This is an abstract channel model and the student will not immediately see its connection to reality. I motivate this choice two ways: (i) by asking the students to trust that the theory we develop for that abstract channel will turn out to be exactly what we need for more realistic channel models and (ii) by reminding them of the (too often overlooked) problem-solving technique that consists in addressing dicult problem by considering rst simplied toy versions of the same.

Chapter 1 Introduction and Objectives


Apart from this introductory chapter, this book focuses on the system-level engineering aspects of digital point-to-point communication. In a way, digital point-to-point communication is the building block we use to construct complex communication systems including the Internet, cellular networks, satellite communication, etc. The purpose of this chapter is to provide contextual information. Specically, we do the following: (i) Place digital point-to-point communication into the bigger picture. We do so in Section 1.1 where we discuss the Open System Interconnect (OSI) layering model; (ii) Provide the background that justies how we often model the information to be transmitted, namely as a sequence of independent and uniformly distributed bits (Section 1.2); (iii) Give a preview for the rest of the book (Section 1.3); (iv) Clarify the dierence between analog and digital communication (Section1.4); (v) Conclude the chapter with a few amusing and instructive anecdotes related to the history of communication (Section 1.5). The reader eager to get started can skip this chapter without losing anything essential to understand the rest of the text.

1.1

The Big Picture through the OSI Layering Model

When we communicate using electronic devices, we produce streams of bits that typically go through various networks and are processed by devices from a variety of manufacturers. The system is very complex and there are a number of things that can go wrong. Due to their complexity, it is amazing that we can communicate as easily and reliably as 1

2 Sending Process Application AH Data

Chapter 1. Receiving Process Application

Presentation

PH

Presentation

Session Transport

SH

Session Transport

TH

Network

NH

Network

Data Link Physical

DH

DT

Data Link Physical

DH NH TH

SH

PH AH

Data

DT

Physical Medium Figure 1.1: OSI Layering Model.

we do. This could hardly be possible without layering and standardization. The Open System Interconnect (OSI) layering model of Figure 1.1 describes a standard data-ow architecture. Although it did not become a commercial success, it inspired other protocols, notably the TCP/IP used for the internet. In this section we use the OSI layering model to convey the basic idea of how modern communication networks deals with the key challenges, notably routing, ow control, reliability, privacy, and authenticity. For the sake of concreteness, let us take e-mailing as a sample activity. Computers use bytes (8 bits) or multiples thereof to represent letters. So the message of an e-mail is represented by a stream of bytes that we call a data segment. Received e-mails usually sit on a remote server. When we launch a program to read e-mail hereafter referred to as the client it checks into the server to see if there are new e-mails. It depends on the clients setting whether a new e-mail is automatically downloaded to the client or just a snippet is automatically downloaded until the rest is explicitly requested. The client tells the server what to do. For this to work, the server and the client not only need to

1.1. The Big Picture through the OSI Layering Model

be able to communicate the content of the mail message but they also need to talk to one another for the sake of coordination. This requires a protocol. If we use a dedicated program to do email (as opposed to using a web browser), the common protocols used for retrieving e-mail are the IMAP (Internet Message Access Protocol) and the POP (Post Oce Protocol), whereas for sending e-mail it is common to use the SMTP (Simple Mail Transfer Protocol). The idea of a protocol is not specic to e-mail. Every application that uses the Internet needs a protocol to interact with a peer application. The OSI model reserves the application layer for programs (also called processes) that implement application-related protocols. In terms of data trac, the protocol places a so-called application header (AH) in front of the data packet. The top arrow in the gure indicates that the two application layers talk to one another as if they had a direct link. Typically, there is no direct physical link between the two application layers. Instead, the communication between application layers goes through a shared network, which creates a number of challenges. To begin with, there is no guarantee of privacy for anything that goes through a shared network. Furthermore, networks carry data from many users and can get congested. Hence, if possible, the data should be compressed to reduce the trac. Finally, there is no guarantee that the sending and the receiving computers represent letters the same way. Hence the application header and the data need to be communicated by using a universal language. Translation to/from a universal language, compression, and encryption are done by the presentation layer. The presentation layer also needs a protocol to talk to the peer presentation layer at the destination. The protocol is implemented by means of the presentation header (PH). For the presentation layers to talk to one another, we need to make sure that the two hosting computers are connected. Establishing, maintaining and ending communication between physical devices is the job of the session layer. The session layer also manages access rights. Like the other layers, the session layer uses a protocol to interact with the peer session layer. The protocol is implemented by means of the session header (SH). The layers we have discussed so far would suce if all the machines of interest were connected by a direct and reliable link. In reality, links are not always reliable. Making sure that from an end-to-end point of view the link appears reliable is one of the tasks of the transport layer. By means of parity check bits, the transport layer veries that the communication is error-free and if not, it requests retransmission. The transport layer has a number of other functions, not all of which are necessarily required in any given network. The transport layer can break and reassemble long data packets into shorter ones or it can multiplex several sessions between the same two machines into a single one. The transport layer uses the transport header (TH) to communicate with the peer layer. Now assume that packets have to go through intermediate nodes. The network layer will provide routing and ow control services. Flow control refers to the need to queue up packets at a node if the network is congested or if the receiving end cannot absorb data suciently fast. Unlike the above layers, which operate on an end-to-end basis,

Chapter 1.

the network layer and the layers below have a process also at intermediate nodes. The protocol of the network layer is implemented in the network header (NH). The network header contains the destination address. The next layer is the data link control (DLC) layer. Unlike the other layers, the DLC puts a header at the beginning and a tailer at the end of each packet. Some of the overhead bits are parity check bits meant to determine if errors have occurred in the link between nodes. If the DLC detects errors, it might ask to retransmit or drop the packet altogether. If it drops the packet, it is up to the transport layer, which operates on an end-to-end basis, to request retransmission. The other important function of the DLC is to create a synchronous bit stream for the next layer, which works synchronously. So the DLC not only has to output bits at the rate determined by the next layer but it has to ll in with dummy bits when there is nothing to be transmitted. The header and the tailer inserted by the DLC make it possible for the peer processor to identify and remove dummy bits. The physical layer the subject of this text is the bottom layer of the OSI stack. The physical layer creates a more-or-less reliable bit pipe out of the physical channel between two nodes. It does so by means of a transmitter/receiver pair, called modem 1 , on each side of the physical channel. For best performance, the sender on one side and the receiver on the other side of the physical channel need to work synchronously. To make this possible, the service provided by the DLC layer is the ability to send data at a constant rate. We will learn that the physical layer designer can trade reliability for complexity and delay. In summary, the OSI model has the following characteristics. Although the actual data transmission is vertical, each layer is programmed as if the transmission were horizontal. For a process, whatever is not part of its own header is considered as being actual data. In particular, a process makes no distinction between the headers of the higher layers and the actual data segment. For instance, the presentation layer translates, compresses, and encrypts whatever it receives from the application layer, attaches the PH, and sends the result to its peer presentation layer. The peer in turn reads and removes the PH and decrypts, decompresses, and translates the packet which is then passed to the application layer. What the application layer receives is identical to what the peer application layer has sent up to a possible language translation. The DLC inserts a tailer in addition to a header. All layers, except the transport and the DLC layer, assume that the communication to the peer layer is error-free. If it can, the DLC layer provides reliability between successive nodes. Even if the reliability between successive nodes is guaranteed, nodes might drop packets due to queueing overow. The transport layer, which operates at the end-to-end level, will detect missing packets and will request retransmission.
Modem is the result of contracting mod ulator and dem odulator. In analog modulation, such as Frequency Modulation (FM) and in Amplitude Modulation (AM), the modulator is the heart of the transmitter. It determines how the information signal aects the transmitted signal. In AM it is the carriers amplitude, and in FM the carriers frequency that is modulated by the information signal. The operation is undone at the receiver by the demodulator. Although in digital communication it is no longer appropriate to talk about modulator and demodulator, the term modem has remained in use.
1

1.2. The Bit as the Universal Information Currency

It should be clear that a layering approach drastically simplies the tasks of designing and deploying communication infrastructure. For instance, a programmer can test the application layer protocol with both applications running on the same computer thus bypassing all networking problems. Likewise, a physical layer specialist can test a modem on point-to-point links, also disregarding networking issues. Each of the tasks of compressing, providing reliability, privacy, authenticity, routing, ow control, and physical layer communication require specic knowledge. Thanks to the layering approach, each task can be accomplished by people specialized in their respective domain. Similarly, equipments from dierent manufacturers work together, as long as they respect the protocols. The OSI model is a generic standard that does not prescribe a specic protocol. The Internet uses the TCP/IP protocol, which is more or less compatible with the OSI architecture but uses 5 instead of 7 layers. The reduction is essentially obtained by combining the OSI application, presentation, and session layers into a single layer called application layer. Below the application layer is the TCP layer, which provides end to end services and corresponds to the OSI transport layer. Below the TCP layer is the IP layer, which deals with routing. The DLC and the physical layers complete the stack.

1.2

The Bit as the Universal Information Currency

We will often assume that the message to be communicated is the realization of a sequence of independent and identically distributed binary symbols. The purpose of this section is to justify this assumption. In so doing, we will see why the bit has the status of a universal information currency. Source coding is about the representation of a signal by a string of symbols from a nite (often binary) alphabet. Some applications require the ability to faithfully reconstruct the original from the representation. In some cases, only an approximate reconstruction is required. In general, a more accurate reconstruction requires a longer representation. The goal of source coding is to provide the shortest possible representation for a desired reconstruction accuracy. We could dedicate an entire course to source coding, but the main idea can be summarized rather quickly. A source output is a realization of a stochastic process. Modeling the source as a stochastic process makes it possible to rigorously study some fundamental questions such as the following. Let . . . , B0 , B1 , B2 , . . . be the (discrete time and alphabet) stochastic process representing the source and R(t) be the (continuous-time) stochastic processes representing the channel output observed by a receiver. Is it possible for the receiver to reconstruct the realization of . . . , B0 , B1 , B2 , . . . from the realization of R(t) ? If we model the source signal as a deterministic function, say as the output of a pseudorandom generator of a xed initial state, then we can always reproduce the same function at the receiver and reconstruct the source signal. Hence, in this case the answer to the ques-

Chapter 1.

tion posed in the preceding paragraph is always yes, even without observing the channel output R(t) . Clearly there is something wrong with this setup, namely that the source output is deterministic. If we dene the source output as a random process, we can no longer cheat about the way we reconstruct the source output. We now describe three kind of sources. Discrete Sources: A discrete source is modeled by a discrete-time random process that takes values in some nite alphabet. A computer le is represented as a sequence of bytes, each of which that can take on one of 256 possible values. So when we consider a le as being the source signal, the source can be modeled as a discrete-time random process taking values in the nite alphabet {0, 1, . . . , 255} . Alternatively, we can consider the le as a sequence of bits, in which case the stochastic process takes values in {0, 1} . For another example, consider the sequence of pixel values produced by a digital camera. The color of a pixel is obtained by mixing various intensities of red, green and blue. Each of the three intensities is represented by a certain number of bits. One way to exchange images is to exchange one pixel at a time, according to some predetermined way of serializing the pixels intensities. Also in this case we can model the source as a discrete-time process. A discrete-time sequence taking values in a nite alphabet can always be converted into a binary sequence. The resulting average length depends on how we do the conversion and on the source statistic. The statistic matters because to obtain the minimum average, the length of the binary sequence depends on the probability of the original sequence. In principle we could run through all possible ways of making the conversion; for each way we could determine the average number of bits per source symbol; and if we carry this program out to the end we will nd the minimum average length. Surprisingly, we can bypass this tedious process and nd the result by means of a simple formula that determines the so-called source entropy. Typically the entropy is denoted by the letter H and has bits as units. If the entropy of a discrete source is H bits, then it is possible to encode blocks of source symbols into strings of bits in such a way that it takes H bits per source symbol in average. Conversely, with a smaller average number of bits per source symbol, it is not possible to have a one-to-one map from source symbols to bits. Example 1. For a discrete memoryless source that produces symbols taking values in an m -letter alphabet the entropy formula is
m

i=1

pi log2 pi ,

where pi , i = 0, 1, . . . , m 1 is the probability that the source outputs the i th alphabet letter. For instance, if m = 3 and the probabilities are p1 = 0.5 , p2 = p3 = 0.25 then H = 1.5 . In this case, in average we can encode 2 ternary source symbols into 3 binary symbols (but not fewer). One way to achieve this is to map the most likely source letter into 1 and the other two letters into 01 and 00 , respectively. Any book on information theory will prove the stated relationship between the entropy

1.2. The Bit as the Universal Information Currency

of a memoryless source and the minimum average number of bits needed to represent a source symbol. A standard reference is [2]. From the above result, it is not hard to see that the only way for a binary source to have entropy 1 [bit per symbol] is if it produces symbols that are independent and uniformly distributed. Such a source is called a Binary Symmetric Source (BSS). We conclude that a binary source is either a BSS or its output can be compressed. Compression is a well understood technique, widely used in modern systems. Hence communication devices are typically designed by assuming the the source is a BSS. Discrete-Time Continuous-Alphabet Sources: They are modeled by a discrete-time random process that takes values in some continuous alphabet. If we measure the temperature of a room at regular intervals of time, we obtain a sequence of real-valued numbers. We would model it as the realization of a discrete-time continuous alphabet random process. To store or to transmit the realization of such a source, we rst round up or down the number to the nearest element of some xed discrete set of numbers. This is called quantization and the result is the quantized process with the discrete set as its alphabet. Quantization is irreversible, but by choosing a suciently dense alphabet, we can make the dierence between the original and the quantized process as small as desired. As described in the previous paragraph, the quantized sequence can be converted into a binary sequence that has the same statistic as the output of a BSS. Continuous-Time Sources: They are modeled by a continuous-time random process. The electric signal produced by a microphone can be seen as a sample path of a continuoustime random process. In all practical applications, such signals are either band-limited or can be lowpass-ltered to make them band-limited. For instance, even the most sensitive human ear cannot detect frequencies above some value (say 25 KHz). Hence any signal meant for the human ear can be made band-limited through a lowpass lter. The sampling theorem (Theorem 72) asserts that a band-limited signal can be represented by a discretetime sequence, which in turn can be made into a binary sequence as described. In summary, given that we compute and communicate through binary symbols, that electronic circuits are becoming more and more digital, and that every information source can be described in terms of binary symbols, it is not surprising that the bit has become the universal currency for the exchange of information. In fact every memory device and every communication device has a binary interface these days. The practical benets of agreeing to communicate by means of a standard (binary) alphabet are obvious. But is there a performance reduction associated with the rigidity of a xed alphabet? Yes and no. It depends on the specic scenario and on how we measure performance; but there is an important case for which the answer is no. We briey summary this result as it is one of the most celebrated results of information theory and it constitutes the single most authoritative fundamental justication for giving the bit the status of a universal currency. Like sources, channels too have a fundamental quantity assigned to them, namely the channel capacity, denoted by C , and typically expressed in bits per second. But what

Chapter 1.

is a channel? Einstein made the following analogy: You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. The important point here is that the cats reaction is always a bit dierent even when we pull his tail the same way. More concretely, a channel can be a cable of twisted copper wires, a phone line, a coaxial cable, an optical ber, a radio link, etc. Channels are noisy, hence the output is not a deterministic function of the input. Surprisingly, as long as the channel output is not statistically independent of the input we can use it to transmit information reliably. Additional noise can slow down the transmission rate but cannot prevent us from communicating reliably. For every channel model and set of constraints (e.g. limits on the power and bandwidth of the transmitted signal) we can compute the channel capacity C in, say, bits per second. A fundamental theorem in information theory proves the existence of a transmitter and a receiver that make it possible to send C bits per second while keeping the error probability as small as desired. Conversely, if we try to send at a rate higher than C bits per second then errors will be inevitable. From the discussion on the source entropy H and channel capacity C , we conclude that we can turn the source signal into a sequence of H bits per seconds and the channel into a bit pipe capable of transporting C bits per second. As long as H < C , we can send the source bits through the channel. A source decoder will be able to reconstruct the source signal from the compressed source bits reproduced by the receiver. What if H > C ? Could we still reproduce the source output across the channel if we are not constrained to a binary interface? From information theory, we know that it is not possible. To summarize, the bit is not only a binary symbol used to exchange information. It is also a unit of measure. It is used to quantify the rate at which a source produces information and to quantify the ultimate rate at which a physical channel is capable of transmitting information reliably when equipped with an appropriate transmitter/receiver pair. These are results from information theory. In this text we will not need results from information theory but we will often assume that the source is a BSS or equivalent.

1.3

Problem Formulation and Preview

Our focus is on the system aspects of digital point-to-point communications. By the term system aspect we mean that we will remain at the level of building blocks rather than going into electronic details; Digital means that the message is taken from a nite set of possibilities; and we restrict ourselves to point-to-point communication as it constitutes the building block of all communication systems. Digital communications is a rather unique eld in engineering in which theoretical ideas from probability theory, stochastic processes, linear algebra, and Fourier analysis have had an extraordinary impact on actual system design. The mathematically inclined will appreciate how these theories seem to be so appropriate to solve the problems we will

1.3. Problem Formulation and Preview encounter.

iH

- Transmitter

w i ( t) W

Linear Filter

- l 6

Y ( t)

Receiver

- i

Noise N ( t)

Figure 1.2: Basic point-to-point communication system over a band-limited Gaussian channel.

Our main target is to acquire a solid understanding about how to communicate through a channel, modeled as depicted in Figure 1.2. The source chooses a message represented in the Figure by the index i that takes value in some nite alphabet H . As already mentioned, in reality, the message is represented by a sequence of bits, but for notational convenience it is often easier to label each sequence with an index and use the index to represent the message. The transmitter maps a message i H into a signal wi (t) W where W and H have the same cardinality. The channel lters the signal and adds Gaussian noise. The receivers task is to guess the message based on the channel output R (t) . In a typical scenario, the channel is given and the communication engineer designs the transmitter/receiver pair, taking into account objectives and constraints. The objective could be to maximize the number of bits per second being communicated while keeping the probability that the receiver makes a wrong decision below some threshold. The constraints could be expressed in terms of the signals power and bandwidth. The noise added by the channel is Gaussian because it represents the contribution of various noise sources.2 The lter has both a physical and a conceptual justication. The conceptual justication stems from the fact that most wireless communication systems are subject to a license that dictates, among other things, the frequency band that the signal is allowed to occupy. A convenient way for the system designer to deal with this constraint is to assume that the channel contains an ideal lter that blocks everything outside the intended band. The physical reason has to do with the observation that the signal emitted from the transmit antenna typically encounters obstacles that create reections and scattering. Hence the receiving antenna might capture the superposition of a number of delayed and attenuated replicas of the transmitted signal (plus noise). It is a straightforward exercise to check that this physical channel is linear and time-invariant.
2 Individual noise sources do not necessarily have Gaussian statistics. However, due to the central limit theorem, their aggregate contribution is often quite well approximated by a Gaussian random process.

10

Chapter 1.

Thus it can be modeled by a linear lter as shown in the gure.3 Additional ltering may occur due to the limitations of some of the components at the sender and/or at the receiver. For instance, this is the case of a linear amplier and/or an antenna for which the amplitude response over the frequency range of interest is not at and the phase response is not linear. The lter in Figure 1.2 accounts for all linear time-invariant transformations that act upon the communication signals as they travel from the sender to the receiver. The channel model of Figure 1.2 is meaningful for both wireline and wireless communication channels. It is referred to as the band-limited Gaussian channel. Mathematically, a transmitter implements a one-to-one mapping between the message set and a set of signals. Without loss of essential generality, we may let the message set be H = {0, 1, . . . , m 1} for some integer m 2 . For the channel model of Figure 1.2 the signal set W = {w0 (t), w1 (t), . . . , wm1 (t)} consists of continuous and nite-energy signals. We think of the signals as stimuli used to excite the channel input, chosen in such a way that from the channels reaction at the output the receiver can tell, with high probability, which stimulus was applied. Even if we model the source as producing an index from H = {0, 1, . . . , m 1} rather than a sequence of bits, we can still measure the communication rate in terms of bits per second (bps). In fact the elements of the message set may be labeled with distinct binary sequences of length log2 m . Hence every time that we communicate a message, we equivalently communicate that many bits. If we can send a signal every T seconds, then the message rate is 1/T [messages per second] and the bit-rate is (log2 m)/T [bits per second]. Digital communication is a eld that has seen many exciting developments and is still in vigorous expansion. Our goal is to introduce the reader to the eld, with emphasis on fundamental ideas and techniques. We hope that the reader will develop an appreciation for the tradeos that are possible at the transmitter, will understand how to design (at the building block level) a receiver that minimizes the error probability, and will be able to analyze the performance of a point-to-point communication system.
If the scattering and reecting objects move with respect to the transmit/receive antenna, then the lter is time-varying. We do not consider this case.
3

1.3. Problem Formulation and Preview


6 ?

11

Messages Decoder n -Tuples


6

Encoder T R A N S M I T T E R

Waveform Former Baseband Waveforms


?

n -Tuple Former
6

R E C E I V E R

UpConverter Passband Waveforms


  6

DownConverter
6

R ( t)

N ( t)

Figure 1.3: Decomposed transmitter and receiver.

We will discover that a natural way to design, analyze, and implement a transmitter/receiver pair for the channel of Figure 1.2 is to think in terms of the modules shown in Figure 1.3. Like in the OSI layering model, peer modules are designed as if they were connected by their own channel. The bottom layer reduces the bandpass channel to the more basic baseband channel. The middle layer further reduces the channel to a discrete-time channel that can be handled by the encoder/decoder pair. We conclude this introduction with a very brief overview of the various chapters. Chapter 2 addresses the receiver design problem for discrete-time observations, in particular in relationship to the channel seen by the top layer of Figure 1.3, which is the discrete-time additive white Gaussian noise (AWGN) channel. Throughout the text the receivers objective will be to minimize the probability of an incorrect decision. In Chapter 3 we upgrade the channel model to a continuous-time AWGN channel. We will discover that all we have learned in the previous chapter has a direct application for the new channel. In fact, we will discover that without loss of optimality, we can insert what we call waveform former at the channel input and the corresponding n -tuple former at the output and, in so doing, we turn the new channel model into the one already considered.

12

Chapter 1.

Chapter 4 develops intuition about the high-level implications of the signal set used to communicate. It is in this chapter that we start shifting attention from the problem of designing the receiver for a given set of signals to the problem of designing the signal set itself. The next two chapters are devoted to practical signaling. In Chapter 5 we focus on the waveform former for what we call symbol-by-symbol on a pulse train. Chapter 6 is a case study on coding. The encoder will be of convolutional type and the decoder will be based on the Viterbi algorithm. Chapter 7 is about passband communication. A typical passband channel is the radio channel. What we have learned in the previous chapters can, in principle, be applied directly to passband channels; but there are several reasons in favor of a design that consists of a baseband transmitter followed by an up-converter that shifts the spectrum to the desire frequency interval. The receiver reects the transmitters structure. An obvious advantage of this approach is that we decouple most of the transmitter/receiver design from the center frequency of the transmitted signal. If we decide to shift the center frequency, like when we change channel in a walkie-talkie, we just act on the up-converter and on the corresponding structure of the receiver, and this can be done very easily. Furthermore, having the last stage of the transmitter operate in its own frequency band prevents the output signal from feeding back over the air into the earlier stages and create the equivalent of the annoying audio feedback that occurs when we put a mike next to the corresponding speaker. As it turns out, the best way to design and analyze passband communication systems is to exploit the fact that real-valued passband signals can be represented by complex-valued baseband signals. This is the reason that throughout the text we learn to work with complex-valued signals.

1.4. Digital vs Analog Communication

13

1.4

Digital vs Analog Communication

The meaning of digital versus analog communication needs to be claried, in particular because it should not be confused with their meaning in the context of electronic circuits. We can communicate digitally by means of analog or digital electronics and the same is true for analog communication. We speak of digital communication when the message to be communicated is one of a nite set of possible choices. For instance, if we communicate 1000 bits, we are communicating one out of 21000 possible binary sequences of length 1000 . To communicate our choice, we use signals that are appropriate for the channel at hand. No matter which signals we use, the result will be digital communication. One of the simplest ways to do this is that each bit determines the amplitude of a carrier over a certain duration of time. So the rst bit could determine the amplitude from time 0 to T , the second from 2T to 3T , etc. This is the simplest form of pulse amplitude modulation. There are many sensible ways to map bits to waveforms that are suitable to a channel, and whichever way we choose it will be a form of digital communication. We speak of analog communication when the choice to be communicated is one of a continuum of possibilities. This choice could be the signal at the output of a microphone. Any tiny variation of the signal can constitute another valid signal. Two popular ways to do analog communication are amplitude modulation (AM) and frequency modulation (FM): In AM we let the amplitude of a carrier be a function of the information signals amplitude. In FM it is the carriers frequency that varies as a function of the information signal. The dierence between analog and digital communication might seem to be minimal at this point, but actually it is not. It all boils down to the fact that in digital communication the receiver has a chance to exactly reconstruct the choice. The receiver knows that there is a nite number of possibilities to choose from. The signals used by the transmitter are chosen to facilitate the receivers decision. One of the performance criteria is the error probability, and we can design systems that have such a small error probability that for all practical purposes it is zero. The situation is quite dierent in analog communication. As there is a continuum of possible source signals, roughly speaking, the transmitter has to describe the source signal in all its details. The channel noise will alter the description. Any change, no matter how small, will correspond to the description of an alternative source signal. There is no chance for the receiver to reconstruct an exact replica of the original. It no longer makes sense to talk about error probability. If we say that an error occurs every time that there is a dierence between the original and the reproduction, then the error probability is 1 . The dierence, which may still seem a detail at this point, is made signicant by the notion of channel capacity discussed in Section 1.2. Recall that for every channel there is a largest rate below which we can make the error probability as small as desired and above which it is impossible to reduce the error probability below a certain value. Now we can

14

Chapter 1.

see where the dierence between analog and digital communication becomes fundamental. For instance, if we want to communicate at 1 giga bit per second (Gbps) from Zurich to Los Angeles by using a certain type of cable, we can cut the cable into pieces of length L , chosen in such a way that the channel capacity of each piece is greater than 1 Gbps. We can then design a transmitter and a receive that allow us to communicate virtually error-free at 1 Gbps over distance L . By concatenating many such links, we can cover any desired distance at the same rate. By making the error probability over each link suciently small, we can meet the desired end-to-end probability of error. The situation is very dierent in analog communication, where every piece of cable contributes irreversibly to degradation. Need another example? Compare faxing a text to sending an e-mail over the same telephone line. The fax uses analog technology. It treats the document as a continuum of gray levels (in two dimensions). It does not dierentiate between text or images. The receiver prints a degraded version of the original. And if we repeat the operation multiple times by re-faxing the latest reproduction it will not take long until the result is dismal. E-mail on the other hand is a form of digital communication. Most of the time, the receiver reconstructs an identical replica of the transmitted text. Because we can turn a continuous-time source into a discrete one, as described in Section 1.2, we always have the option of doing digital rather than analog communication. In the conversion from continuous to discrete, there is a deterioration that we control and can make as small as desired. The result can, in principle, be communicated over unlimited distance and over arbitrarily poor channels with no further degradation.

1.5

A Few Anecdotes

This text is targeted mainly at engineering students. Throughout their careers some will make inventions that may or may not be successful. After reading The Information: A History, a Theory, a Flood by James Gleick4 [3], I felt that I should pass on some anecdotes that nicely illustrate one point, specically that no matter how great an idea or an invention is, there will be people that will criticize it. The printing press was invented by Johannes Gutenberg around 1440. It is now recognized that it played an essential role in the transition from medieval to modern times. Yet in the 16th century, the German priest Martin Luther decried that the multitude of books [were] a great evil; in the 17th century, referring to the horrible mass of books Leibniz feared a return to barbarism for in the end the disorder will become nearly insurmountable; in 1970 the American historian Lewis Mumford predicted that the overproduction of books will bring about a state of intellectual enervation and depletion hardly to be distinguished from massive ignorance.
A copy of the book was generously oered by our dean, Martin Vetterli, to each professor as a 2011 Christmas gift.
4

1.5. A Few Anecdotes

15

The telegraph was invented by Claude Chappe during the French Revolution. A telegraph was a tower for sending optical signals to other towers in line of sight. In 1840 measurements were made to determine the transmission speed. Over a stretch of 760 Km, from Toulon to Paris comprising 120 stations, it was determined that two out of three messages arrived within a day during the warm months and that only one in three arrived in winter. This was the situation when F. B. Morse proposed the French government a telegraph that used electrical wires. Morses proposal was rejected beause No one could interfere with telegraph signals in the sky, but wire could be cut by saboteurs. [3, Chapter 5]. In 1833 the lawyer and philologist John Pickering, referring to the American version of the French telegraph on Central Wharf (a Chappe-like tower communicating shipping news with three other stations in a twelve-mile line across Boston Harbor) asserted that It must be evident to the most common observer, that no means of conveying intelligence can ever be devised, that shall exceed or even equal the rapidity of the Telegraph, for, with the exception of the scarcely perceptible relay at each station, its rapidity may be compared with that of light itself. In todays technology we can communicate over optical ber at more than 1012 bits per second, which may be 12 orders of magnitude faster than the telegraph referred to by Pickering. Yet Pickerings awed reasoning may have seemed correct to most of his contemporaries. The electrical telegraph eventually came and was immediately a great success, yet some feared that it would put newspapers out of business. In 1852 it was declared that All ideas of connecting Europe with America, by lines extending directly across the Atlantic, is utterly impracticable and absurd. Six years later Queen Victoria and President Buchanan were communicating via such a line. After the telegraph came the telephone. The rst experimental applications of the electrical speaking telephone were made in the US in the 1870s. It quickly became a great success in the US, but not in England. In 1876 the chief engineer of the General Post Oce, William Preece, reported to the British Parliament: I fancy the descriptions we get of its use in America are a little exaggerated, though there are conditions in America which necessitate the use of such instruments more that here. Here we have a superabundance of messengers, errand boys and things of that kind . . . . I have one in my oce, but more for show. If I want to send a messageI use a sounder or employ a boy to take it. Compared to the telegraph, the telephone looked like a toy because any child could use it. In comparison, the telegraph required literacy. Business people rst thought that the telephone was not serious. Where the telegraph dealt in facts and numbers, the telephone appealed to emotions. Seeing information technology as a threat to privacy is not new. Already at the time one commentator said, No matter to what extent a man may close his doors and windows, and hermetically seal his key-holes and furnace-registers, with towels and blankets, whatever he may say, either to himself or a companion, will be overheard. In summary, the printing press has been criticized for promoting barbarism; the electrical telegraphy for being vulnerable to vandalism, a threat to newspapers, and not superior to the French telegraph; the telephone for being childish, of no business value, and a threat

16

Chapter 1.

to privacy. We could of course extend the list with comments about typewriters, the cell phone, computers, the Internet, or about applications such as e-mail, SMS, wikipedia, Street View by Google, etc. It may be good to keep some of these examples in mind when attempts to promote new ideas are met with resistance.

You might also like