You are on page 1of 17

Week 5: Small Time Intervals, Continuous Limit.

Infinitesimal Calculus
(see also Neftci, Chapter 3)

Lecture V.1 Functions


Learning continuous time methods of option pricing inevitably involves some use of math,
since prices are, after all, expressed as numbers. One would have to come up with some
concrete figure for the price of a given financial product, otherwise, no one would agree to
pay just for ‘general arguments’. Thus we have to get familiar with some mathematical
objects and rules which are used in option pricing.

Since we have gone “continuous”, it may seem as obvious that we should turn to calculus in
order to price derivatives which change continuously in time.

The goal of the next two lectures is to review the major concepts of standard (deterministic)
calculus and highlight exactly those points at which standard calculus will fail to be a good
approximation in continuous time finance. This failure of deterministic calculus will “push”
us further on to stochastic calculus, which we shall begin to learn starting from the next week.
In a sense this is the last ‘deterministic week’ in your life.

When we consider prices of various financial derivatives, we view them as determined by


prices of underlying assets. In mathematical language, derivatives are functions of share
prices. Thus, we start with reviewing the notion of functions.

Suppose A and B are two sets of prices of a share and an option, respectively, and let f be a
rule which associates to every element x of A, exactly one element y in B. (The set A is called
the domain, and the set B is called the range of f.) Such a rule is called a function or a
mapping. In mathematical analysis, functions are denoted by

f : A→ B
or by
y = f ( x), x ∈ A.

If the set B is made of real numbers, which is obviously true in the case under consideration,
then we say that f is a real-valued function and write

f : A → R.

If the sets A and B are themselves collections of functions, like in the case of options on
options, then f transforms a function into another function, and is called an operator.

There are some important functions that play special roles in our discussion. I will briefly
review them.

The Exponential Function


y = e x , x ∈ R.

As we have already seen, this function is generally used in discounting asset prices in
continuous time. We derived this function when we considered continuous compounding.

The Logarithmic Function

The logarithmic function is defined as the inverse of the exponential function. Given

y = e x , x ∈ R,

the natural logarithm of y is given by

ln( y ) = x, y >0.

A practitioner may sometimes work with the logarithm of asset prices, i.e. rates of return.
Note that while y is always positive, there is no such restriction on x. Hence, the logarithm of
an asset price may extend from minus to plus infinity.

The Derivative

Now we review a mathematical operation of differentiation or taking a derivative


of a function. This term may be confused with the term “derivative s
used in finance1. Thus, we have to exercise some care in distinguishing these two
notions. Perhaps, one reason for this confusion could be that the term
“derivative” in finance was introduced before mathematical analysis with its
own derivatives became so important in valuing these financial products.

The notion of the (mathematical) derivative can be looked at in (at least) two different ways.
First of all, the derivative is a way of dealing with the “smoothness” of functions. In
particular, if trajectories of asset prices are “too irregular”, then their derivative with respect
to time may not exist.

Second, the derivative is a way of calculating how one variable responds to a change in
another variable. For example, given a change in the price of the underlying asset, we may
want to know how the market value of an option written on it may move.

The derivative is a rate of change. But it is a rate of change for infinitesimal movements. We
give a formal definition first.

Definition Of The Mathematical Derivative

1
Of course, it would be very nice, if mathematical derivatives were tradable assets, say, one pound for the
derivative of an exponent, then all mathematicians would be extremely reach people, who they are not,
unfortunately.
Let
y = f (x)

be a function of x ∈ R . Then the derivative of f(x) with respect to x, if it exists, is formally


denoted by the symbol f x and is given by

f ( x + ∆) − f ( x)
f x = lim Equation 8,
∆ →0 ∆

where Ä is an increment in x (do not confuse it with the Ä hedging!)

The variable x can represent any real-life phenomenon. Suppose it represents time. Then Ä
would correspond to a finite time interval. The f(x) would be the value of y at time x, and the
f(x+ Ä) would represent the value of y at time x+ Ä. Hence, the numerator in eq.(8) is the
change in y during a time interval Ä (before we denoted it as ät). The ratio itself becomes the
rate of change in y during the same interval. For example, if y is the price of a certain asset at
time x, the ration in eq. (8) would represent the rate at which the price changes during an
interval Ä.

Why is a limit being taken in eq. (8)? In defining the derivative, the limit has a practical use.
It is taken to make the ratio in eq. (8) independent of the size of Ä, the time interval that
passes.

For making the ratio independent of the size of Ä, one pays a price. The derivative is defined
for infinitesimal intervals. For larger intervals, the derivative becomes an approximation that
deteriorates as Ä gets larger and larger.

Example 1: The Exponential Function

As an example of derivatives, consider the exponential function:

f ( x) = Ae rx , x ∈ R.

A graph of this function with r>0 is shown:


200

180

df/dx
160

140

120 Aexp(rx)

f(x) 100

80

60

40

20

Taking the derivative with respect to x formally:

df ( x)
fx = = r ⋅ A ⋅ erx = r ⋅ f ( x).
dx

The quantity f x is the rate of change of f(x) at point x. Note that as x gets larger, the term erx
increases. The ratio
fx
=r
f ( x)

is the percentage rate of change. Note that the above equation can also be presented in the
following form
d ln f ( x)
=r
dx
or
df ( x ) = r ⋅ f ( x ) ⋅ dx .

In particular, we see that an exponential function has a constant percentage rate of change
with respect to x.

Example 2: The Derivative As An Approximation

To see an example of how derivatives can be used in approximations, consider the following
argument.
Let Ä be a finite interval. Then using the definition of derivative in eq.(8), and an assumption
approximately

f ( x + ∆ ) ≈ f ( x) + f x ⋅ ∆.

This equality means that the value assumed by f(…) at point x + Ä , can be approximated by
the value of f(…) at point x, plus the derivative f x multiplied by Ä . Note that when one does
not know the exact value of f(x+ Ä), the knowledge of f(x), fx and Ä is sufficient to obtain an
approximation.

This result is shown in figure

f(x)

C
f(x)
B
A α f(x+∆)-f(x)

x x+∆ x

where the ratio


f ( x + ∆) − f ( x)

represents the slope of the segment denoted by AB. As Ä becomes smaller and smaller, with
A fixed, the segment AB converges towards the tangent at the point A. Hence, the derivative
fx is the slope of this tangent.

When we add the product f x Ä to f(x) we obtain the point C. This point can be taken as an
approximation of B. Whether this will be a “good” or a “bad” approximation, depends on the
size of Ä and on the shape of the function f(x).

One relevant example illustrates these points. We consider a function f(x) that is not very
smooth around the point x=0:
50

40

30

20
f ( x) = sin πx − πx cos πx
10

0 x=0
-10

-20

-30

-40

-50

The approximating fˆ ( x + ∆ ) obtained from

fˆ ( x + ∆ ) ≈ f ( x ) + f x ⋅ ∆

may end up being a very unsatisfactory approximation to the true f ( x + ∆). Clearly, the more
“irregular” the function f(x) becomes, the more such approximations are likely to fail.

x0

Consider an extreme case. Consider a random walk in the above graph, where the function
f(x) is continuous, but exhibits extreme variations even in small intervals Ä. Here, not only is
the prediction likely to fail, but even a satisfactory definition of f x may not be obtained. Take,
for example, the point x0. What is the rate of change of the function f(x) at the point x0? It is
difficult to answer. Indeed, one can draw many tangents with differing slopes to f(x) at that
particular point.

It appears that the function f(x), which is a typical example of Brownian motion, is not
differentiable. To deal with such extremely irregular functions we will have to “upgrade”
standard calculus to stochastic calculus.

The Chain Rule

The second use of the derivative is the chain rule. In the examples discussed earlier, f(x) was
a function of x, and x was assumed to represent the time. The derivative was introduced as the
response of a variable to a variation in time.

In pricing derivative securities, we face a somewhat different problem. The price of a


derivative asset, e.g., a call option, will depend on the price of the underlying asset, and the
price of the underlying asset depends on time.

Hence, there is a chain effect. Time passes, new (small) events occur, the price of the
underlying asset changes, and this affects the derivative asset’s price. In standard calculus,
the tool used to analyse these sorts of chain effects is known as the “chain rule".

Suppose in the example just given x was not itself the time, but a deterministic function of
time, denoted by the symbol:

x = g (t ), t ≥ 0 .

Then the function f(x) is called a composite function and is expressed as

y = f ( g (t )).

The question is how to obtain a formula that gives the ultimate effect of a change in t on y?

Definition

For f and g defined as above we have

dy df dg
= ⋅ .
dt dg dt

According to this, the chain rule is the product of two derivatives. First, the derivative of f(g)
is taken with respect to g. Second, the derivative of g(t) is taken with respect to t. The final
effect of t on y is then equal to the product of these two expressions.

The chain rule is a useful tool in approximating the responses of one variable to changes in
other variables.

Take the case of derivative asset prices. A trader observes the price of the underlying asset
continuously and wants to know how the valuation of the complex derivative products
written on this asset would change. If the derivative is an exchange-traded product, these
changes can be observed from the market directly (of course, there is always the question of
whether the markets are correctly pricing the security at that instant.) However, if the
derivative is a “structured” product, its valuation needs to be calculated in-house, using
theoretical pricing models. (You remember our seminar on ‘equity-linked-notes, which are
over-the-counter tradable products?) These pricing models will use some tool such as the
“chain rule”.
Lecture V.2 The Integral
In the previous week, we became familiar with a procedure called binning procedure, when
the share price, S, is represented as the sum of the increments in S in n small time intervals:
n
S (T ) = S (0) + ∑ δ S i .
i =1
As n goes to infinity, the sum in the above formula converges to what is called the integral.
This is one instance, when the integral becomes an important tool in finance. Another
example is the computation of the expected value of various financial assets, which change
continuously. The expected value is also presented as an integral. We shall see shortly how
useful integrals are in finance.

The integral is the mathematical tool used for calculating sums. In contrast to the ∑ operator,
which is used for sums of a countable number of objects, integrals denote sums of
uncountably infinite objects. Since it is not clear how one could “sum” objects that are not
even countable, a formal definition of integral has to be derived.

The general approach in defining integrals is, in a sense, obvious. It is similar to the way we
introduced continuous random variables as a certain limit of a discrete binomial tree. One
would first begin with an approximation involving a countable number of objects, and then
take some limit and move into uncountable objects. Given that different types of limits may
be taken, the integral can be defined in various ways. In standard calculus the most common
form is the Riemann integral.

The Riemann Integral

We are given a deterministic function f(t) of time t ∈ [0 ,T ]. This function can, for
instance, be an instantaneous return on some asset. Suppose we are interested in
calculating the return during a time interval, t∈ [0,T ]. This will be equivalent to integrating
this function over the interval [0,T],

∫ f ( s) ds,
0

which corresponds to the area shown in the figure


A B

f(x)

t0 t1 tn = T x

In order to calculate the Riemann integral, we partition the interval [0,T] into n disjoint
subintervals

t 0 = 0 < t1 < L < t n = T ,

then consider the approximating sum

n
 ti + t i −1 
∑ f 
i =1 2 
(ti − t i −1 ).

Definition

Given that
max | t i − t i −1 |→ 0,
i

the Riemann integral will be defined by the limit

t +t 
n T

lim
max|t i −t i −1|→0

i =1
f  i i −1 (t i − t i −1 ) → ∫ f ( s)ds, Equation 9
 2  0

where the limit is taken in a standard fashion.

The term on the left-hand side of eq.(9) involves adding the surfaces of n rectangles
constructed using (ti-ti-1) as the base and f((ti+ti-1)/2) as the height. Note that the small area A
is approximately equal to the area B. This is especially true if the base of the rectangles is
small and if the function f(t) is smooth – that is, does not vary heavily in small intervals.
In case the sum of the rectangles fails to approximate the area under the curve, we may be
able to correct this by considering a finer partition. As the | ti-ti-1|’s get smaller, the base of the
rectangles will get smaller. More rectangles will be available, and the area can be
approximated “better”.

A counterexample is shown in the figure

Here the function f(t) shows steep variations, like in the case of Brownian motion. If such
variations do not smooth out as the base of rectangles gets smaller, the approximation by
rectangles may fail.

One more comment. The rectangles used to approximate the area under the curve were
constructed in a particular way. To do this, we used the value of f(t) evaluated at the midpoint
of the intervals ti-ti-1. Would the same approximation be valid if the rectangles were defined
in a different fashion? For example, if one defined the rectangles either by
f (ti )(ti − t i −1 )
or by
f (ti −1 )(t i − t i −1 ),

would the integral be different? To answer this question, consider the figure
Upper
rectangle

Rectangle using
midpoint

Lower
rectangle

Note that as the partitions get finer and finer, rectangles defined either way would eventually
approximate the same area. Hence, at the limit, the approximation by rectangles would not
give a different integral even when one uses different heights for defining the rectangles.

It turns out that a similar conclusion cannot be reached in stochastic environments.

An Example

It is always good to consider a concrete example. Let us try to calculate the following integral

∫ S (t )dS (t ) .
0

To do this, one would first partition the time interval [0,T] into n smaller subintervals all of
size δt using

t0=0 < t1 <…< tn = T,

where, as usual, T=nδt and for any k, tk+1 – tk=δt. Second, one would define the Riemann
sums

n
Rn = ∑ S (t k )[S (t k ) − S (t k −1 )]
k =1

and let n go to infinity. Now we can use the following trick: we rewrite the right-hand side of
the above formula as
n
Rn = ∑ [S (tk ) + S (t k −1 ) − S (tk −1 )][S (t k ) − S (t k −1 ) ].
k =1

All we did was to add S(tk-1) and subtract S(tk-1). Now the last expression can be further
presented as follows

n n
Rn = ∑ [S (t k ) + S (t k −1 ) ][S (t k ) − S (t k −1 ) ] − ∑ S (t k −1 )[S (t k ) − S (t k −1 ) ].
k =1 k =1

Here the last term is nothing but the same Riemann sum but for the “upper rectangles”. Since
we argued that in the limit when n goes to infinity, the sums coincide, we arrive at the
following formula

[ ]
n
2 R n = ∑ S 2 (t k ) − S 2 (t k −1 ) = S 2 (T ) − S 2 (0 ) .
k =1

Above we used the fact that we have two sums with the opposite signs. All terms cancel each
other, except for the first and the last ones. Thus, we finally calculated our integral

S 2 (T ) − S 2 (0)
T

∫ S (t )dS(t ) =
0 2
.

Fundamental Theorem Of Calculus

You can skip this subsection, because it represents only mathematical interest. There is one
important property of the integral in deterministic calculus. If we denote F(x) the area under
the curve f(s) from the point a to the point x, that is

F ( x) = ∫ f (s )ds ,
a

then the following relationship is true:

dF
= f (x) .
dx

In fact, it is very easy to prove this formula. According to the definition of the mathematical
derivative, we have to find the difference, F(x+Ä) - F(x). Now from the definition of the
Riemann integral, we can conclude that the given difference is nothing but the area of a small
strip between the points x and x+Ä. For sufficiently small Ä, this strip can be approximated
by a rectangle of height f(x) and width Ä. Thus,

F(x+Ä) - F(x) ≈ f(x) Ä.

Therefore,
F ( x + ∆ ) − F ( x)
Fx = lim = f ( x) .
∆ →0 ∆

In other words, differentiation is the inverse of integration and vice versa. Therefore, it is
sometimes more convenient to solve a differential equation, instead of calculating an integral.
We shall see in further lectures that in stochastic calculus, this fundamental theorem of
deterministic calculus is inapplicable.

Partial Derivatives

Consider a call option. Time to expiration affects the price (premium) of the call in two
different ways. First, as time passes the expiration date will approach, and the remaining life
of the option gets shorter. This lowers the premium (as we have already previously
discussed). But at the same time, as time passes the price of the underlying asset will change.
This will also affect the premium. Hence, the price of a call is a function of two variables. It
is more appropriate to write c = V(S(t),t), where c is the call premium, S(t) is the price of the
underlying asset, and t, time.

Now suppose we “fix” the time variable t and differentiate V(S(t),t) with respect to S(t). The
resulting partial derivative,

∂V (S , t )
= VS ,
∂S

would represent the (theoretical) effect of a change in the price of the underlying asset when
time is kept fixed. This effect is an abstraction, because in practice one needs some time to
pass before S can change.

The partial derivative with respect to time variable can be defined similarly as

∂V (S , t )
= Vt .
∂t

Note that even though S(t) is a function of time, we are acting as if it doesn’t change. Again,
this shows the abstract character of the partial derivative. As t changes, S(t) will change as
well. But in taking partial derivatives, we behave as if it is a constant.

Because of this abstract nature of partial derivatives, this type of differentiation cannot be
used directly in representing actual changes of asset price in financial markets. However,
partial derivatives are very useful as intermediary tools. They are useful in taking a total
change and then splitting it into components that come from different sources, and they are
useful in total differentiations.

Because partial derivatives do not represent “observed” changes, there is no difference


between their use in stochastic or deterministic environments.

Total Differentials
Suppose we observe a small change in the price of a call option at time t. Let this total change
be denoted by the differential dc. How much of this variation is due to a change in the
underlying asset’s price? How much of the variation is the result of the expiration date
getting nearer as time passes? Total differentiation is used to answer such questions.

Let V(S(t),t) be a function of the two variables. Then the total differential is defined as

∂V (S , t ) ∂V ( S, t )
dV = dS + dt.
∂S ∂t

In other words, we take the total change in the asset dS and multiply this by the partial
derivative VS. We take the total change in time dt and multiply this by the partial derivative
Vt. The total change in V(S(t),t) is the sum of these two products.

According to this, total differentiation is calculated by splitting an observed change into


different abstract components.

Taylor Series Expansion

Through out the rest of the lectures, we will constantly make use of what is called
Taylor series expansion (or Taylor expansion). Therefore, I think, it would be
helpful to refresh some of the main ideas of this mathematical tool.

Let f(x) be a certain financial instrument which mathematically described as an


infinitely differentiable function of x∈ R (x can be thought of as being an underlying), and
pick an arbitrary value of x; call this x0.

Definition

The Taylor series expansion of f(x) around x0 ∈ R is defined as

f ( x) = f ( x0 ) + f x ( x0 )( x − x0 ) + 12 f xx ( x0 )( x − x0 ) 2 + 1
3! f xxx ( x0 )( x − x0 ) 3 + K

=∑f (n )
( x0 )( x − x0 ) n
n= 0

Equation 10,

where f ( n ) ( x0 ) is the n-th order derivative of f(x) with respect to x evaluated at the point x0 .

I am not going to elaborate on why the expansion in eq. (10) is valid if f(x) is continuous and
smooth enough. Taylor expansion is taken for granted. We will, however, discuss some
implications of it.

First, note that at this point the expression in eq. (10) is not an approximation. The right-hand
side involves an infinite series. Each element involves “simple” powers of x only, but there
are an infinite number of such elements. Because of this, Taylor series expansion is not very
useful in practice.
Yet the expression in eq. (10) can be used to obtain useful approximations. Suppose we
consider eq.(10) and look at those x’s near the x0. That is, suppose

( x − x0 ) ≅ " small" ,

which may mean that x is just one time-tick away from x0.

Then, we surely have

| x1 − x0 |>| x1 − x0 | 2 >| x1 − x0 |3 > K

(Each time we rise | x1 − x0 | to a higher power, we multiply it by a “small” number and make
the result even “smaller”.)

Under these conditions we may want to drop some of the terms on the right-hand side of
eq.(10), if we can argue that they are negligible. To do this, we must adopt a “convention” for
smallness and then eliminate all terms that are “negligib

But when is a term “small” enough to be negligible?

The convention in calculus is that, in general, terms of order (dx)2 or higher are assumed to be
negligible if x is a deterministic variable.

Thus, if we assume that x is deterministic, and let | x1 − x0 | be small, then we could use the
first-order Taylor approximation:

f ( x) ≈ f ( x0 ) + f x ( x0 )( x − x0 ).

This becomes an equality if the f(x) has a derivative at x0 and if we let

( x − x0 ) → 0.

Under these conditions the infinitesimal variation (x-x0) is denoted by

dx ≈ ( x − x0 )
and the one in f(x) by
df ( x) ≈ f ( x) − f ( x0 ).

As a result we obtain the familiar notation in terms of the differentials:

df ( x) = f x ( x)dx.

Here, the f x(x) is written as a function of x instead of the usual f x(x0), since we are considering
the limit when x approaches x0.

Second-Order Approximation
The equation
f ( x) ≈ f ( x0 ) + f x ( x0 )( x − x0 ).

was called the first-order Taylor series approximation. Often, a better approximation can be
obtained by including the second-order term:

f ( x) ≈ f ( x0 ) + f x ( x0 )( x − x0 ) + 12 f xx ( x0 )( x − x0 ) 2 .

As we shall see, this point is quite relevant for the latter discussion of option pricing.

Appendix

Here we give a few examples of derivation of mathematical derivatives.


x+∆− x
1) f ( x) = x ⇒ f x = lim = 1.
∆ →0 ∆
( x + ∆)2 − x 2 2 x∆ + ∆2
2 ) f ( x) = x 2 ⇒ f x = lim = lim = 2 x.
∆ →0 ∆ ∆ →0 ∆

x +∆ ∆ 1+ n + Ο( ∆2 ) − 1
−e
e e −1
x
n
3) f ( x) = e x ⇒ f x = lim = e x lim = e x lim lim = ex .
∆ →0 ∆ ∆ → 0 ∆ n →∞ ∆→0 ∆
Here are examples of Taylor series.

x 2 x3 x 4
1) e x = 1 + x + + + + ...
2! 3! 4!
x2 x3 x4
2 ) ln(1 + x) = x − + − + ...
2 3 4

Try To Answer The Following Questions

1) How the derivative of a function can be used in approximation of this


function and why this approximation fails to be good for the asset
price?
2) What is the chain rule for options?
3) Where in finance can you see the appearance of the integrals?
4) Using changes of a share price as an example, can you explain the meaning of the
total differential and in what sense the partial derivatives are an abstraction?

You might also like