You are on page 1of 4

Here, is a point between x and x + h1 and is a point between y and y + h2 .

In the remainder of this exposition we will assume that the perturbations |h1 |
and |h2 | are moderately sized lest the results be overly inaccurate.
We now wish to express the function value f (x + h1 , y + h2 ) in terms of the
Newtons method for systems value of f and its (partial) derivatives at (x, y). This will lead to the bi-variate
Taylor expansion of f . Doing Taylor expansion in the y direction first whilst
of non-linear equations keeping x fixed we find that

f (x + h1 , y + h2 ) = vx+h1 (y + h2 )
Brd Skaflestad d h2 d2
= vx+h1 (y) + h2 vx+h1 (y) + 2 2 vx+h1 ().
dy 2 dy
October 3, 2006
Rewriting this in terms of the bi-variate function f we then get
h2 2
f (x + h1 , y + h2 ) = f (x + h1 , y) + h2 f (x + h1 , y) + 2 2 f (x + h1 , ).
y 2 y
Multi-variate Taylor expansions
Differentiating now each term with respect to x we find
We wish to develop a generalisation of Taylor series expansions for multi-
h2 2
variate real functions, i.e. real functions of more than one real variable. f (x + h1 , y + h2 ) = f (x, y) + h1 f (x, y) + 1 2 f (, y)+
x 2 x
h2 2
f (x, y) + 1 2 f (, y) +

The bi-variate case h2
y
f (x, y) + h1
x 2 x
To this end, let f : R2 R be a sufficiently differentiable function of two real h22 2 h21 2 
variables. We define the auxillary single-variable functions uy (x) and vx (y) by 2
f (x, ) + h 1 f (x, y) + 2
f (, ) .
2 y x 2 x
the relations
uy (x) = f (x, y), y fixed Collecting terms in increasing powers of h1 and h2 , we then see
vx (y) = f (x, y), x fixed. f f
f (x + h1 , y + h2 ) = f (x, y) + h1 (x, y) + h2 (x, y) + O(h2 ). (1)
d m
m
d m
m x y
We then observe that dx m uy (x) = xm f (x, y) and dy m vx (y) = y m f (x, y) for
p
all m 0 provided the partial derivatives exist and are well defined. Conse- Here, h = h21 + h22 and O(h2 ) signifies a function g(x) such that
quently
h2 g(h)
uy (x + h1 ) = uy (x) + h1 u0y (x) + 1 u00y () lim = C, 0 < |C| < .
2 h0 h2
f h2 2 f Note though, that the simplification (1) is a direct result of disregarding some
= f (x, y) + h1 (x, y) + 1 2 (, y)
x 2 x of the information included in the higher order terms.
and
h2 Generalisation to several variables
vx (y + h2 ) = vx (y) + h2 vx0 (y) + 2 vx00 ()
2
f h2 2 f The bi-variate Taylor expansion (1) may be further generalised to multi-variate
= f (x, y) + h2 (x, y) + 2 2 (x, ). functions, that is, functions of n 2 variables, f (x1 , x2 , . . . , xn ). Denoting by
y 2 y
 T
x = x1 , x2 , . . . , xn

1 2
a vector of n real components, we may make the notation for multi-variate The system of non-linear equations (3) may then be compactly written as
functions more compact as
f (x) = 0
f (x1 , x2 , . . . , xn ) f (x).
 T
We underscore though that this is merely a compact notation to signify that in which 0 denotes the zero vector, 0 = 0, 0, . . . , 0 Rn .
the function f depends on more than one scalar variable. Suppose now that Now suppose xm r, and define h = r xm . We wish to derive condi-
 T
h = h1 , h2 , . . . , hn is a vector of (presumably small) displacements. Then tions on the vector h in order that f (r) = 0. Using the multi-variate Taylor
expansion (2) in each of the n originial equations (3) we then get
n
X f
f (x + h) = f (x) + hj (x) + O(h2 ), (2)
Pn
f1 2
xj f1 (xm + h) f1 (xm ) + j=1 hj x j
(xm ) + O(h )
j=1
Pn f2
f2 (xm + h) f2 (xm ) + j=1 hj xj (xm ) + O(h2 )
qP
n
in which h = j=1 h2j , is the generalisation of (1) to functions of n scalar 0 = f (r) =

.. =

..
. (4)

. .
variables. We remark in particular that all of the partial derivatives f /xj Pn
are evaluated at the point x. The above is certainly not a proof, but merely a fn (xm + h) fn (xm ) + j=1 hj f n
xj (x m ) + O(h 2
)
statement of the general result.
We now define the n n Jacobian matrix Df (x) as

Newtons method for systems of non-linear


f1
(x) f1
(x) f1
(x)

x1 x2 xn
equations f2 f2 f
x1 (x) x2 (x) xn2 (x)


Df (x) = .

.. .. .. ,

We seek a method for the resolution of the system of n non-linear equations .. . . .
in n variables given by
fn fn

fn
f1 (x1 , x2 , . . . , xn ) = 0 x1 (x) x2 (x) xn (x)

f2 (x1 , x2 , . . . , xn ) = 0  fi
(3) or more succinctly, as Df (x) ij = x (x) for all i, j = 1, 2, . . . , n. Recalling
.. j

. briefly that the matrix-vector product w = Av is defined component-wise as


fn (x1 , x2 , . . . , xn ) = 0. n
X
In other words, n non-linear equations each depending on n variables must be wi = aij vj
satisfied simultaneously. We will assume throughout the derivation that some j=1

point r = r1 , r2 , . . . , rn ]T Rn exists for which all the equations are satisfied.
for all i = 1, 2, . . . , n (when A Rnn and v, w Rn ), we may rewrite (4) as
Our aim now is to define a sequence of vectors xm Rn with m 0 and
hopefully such that 0 = f (xm ) + Df (xm ) h + Rm
lim xm = r.
m
To this end, and to simplify the notation even further, define the vector valued in which the residual Rm Rn is a vector whose components are all O(h2 ).
function Assuming now that the Jacobian matrix Df (xm ) is regular (i.e. invertible),
disregarding the residual Rm and defining the vector hm as the (unique) so-

f1 (x) f1 (x1 , x2 , . . . , xn )
f2 (x) f2 (x1 , x2 , . . . , xn ) lution to the n n linear system
f (x) = . =

..
..

. 0 = f (xm ) + Df (xm ) hm Df (xm ) hm = f (xm ),
fn (x) fn (x1 , x2 , . . . , xn )
3 4
we arrive at the next term in the sequence {xm } as

xm+1 = xm + hm . 2
This, then, fully defines Newtons method for systems of non-linear equations
as
1
1. Choose some initial point x0 .
2. For all m = 0, 1, 2, . . . until convergence
2 2
a) Compute the Jacobian matrix J = Df (xm ). 1 1
b) Solve the linear system Jhm = f (xm ) with respect to hm . 1
c) Set xm+1 = xm + hm .
A few remarks on the method are in order. Most importantly we have to resolve
an n n linear system on each iteration of the method. Moreover, both the 2
matrix and the right hand side in general vary as functions of x, meaning we
have to re-evaluate these quantities on each iteration too. This makes the Figure 1: Graphical interpretation of the system (6).
NewtonRaphson method fairly expensive, especially for large systems (i.e.
n  1).
Finally, note the similarity to the original (scalar) Newton method this system corresponds to constructing the points at which the line x2 = x1 +1
intersects the circle x21 +x22 = 22 . The situation is shown in Figure 1. We notice
f (xm ) from Figure 1 that the system has a solution near the point (x1 , x2 ) = (0.8, 1.8).
xm+1 = xm , m = 0, 1, 2, . . . (5)
f 0 (xm ) We will demonstrate how Newtons method converges to this solution.
The Jacobian matrix of the system (6) can be computed exactly, the result
Defining hm as the solution to the simple, linear equation
being
f 0 (xm ) hm = f (xm ) 1 1
Df (x) = .
for all m = 0, 1, 2, . . . , equation (5) is exactly xm+1 = xm + hm for all m 0. 2x1 2x2
We also notice that the Jacobian matrix Df (xm ) plays the rle of the derivative Note in particular that some of the matrix entries in this case are constant
in the case of systems of non-linear equations. while others depend on the values of x1 and x2 . This is quite common in
practice.
Example Starting from x0 = [0.8, 1.8]T , we first compute the function f (x0 ) and the
Jacobian matrix Df (x0 ) as
We consider the system of two equations given by    
0 1 1
f (x0 ) = , Df (x0 ) = .
x1 x2 + 1 = 0 0.12 1.6 3.6
(6)
x21 + x22 4 = 0. Thus, the initial displacement vector h0 is the solution to the linear system
Using vector notation this is f (x) = 0 in which x = [x1 , x2 ]T and the vector
   
1 1 0
function f (x) is given by f (x) = [x1 x2 + 1, x21 + x22 4]T . Graphically, solving h = f (x0 ) = . (7)
1.6 3.6 0 0.12
5 6
It is very important to remember that the right hand side of the linear sys- becomes quite tediousespecially if the number of Newton iterations required
tem (7) is f (x0 ) and not f (x0 ). In other words, the negative sign is entirely for convergence is large. Fortunately, matlab expediently resolves linear sys-
essential to the success of the method. Solving the linear system (7) we find tems numerically by means of the matrix left division operator, \.
  If we assume the existence of two problem-dependent functions f and Df
0.0230769 which compute the values of f (x) and Df (x) respectively, the Newton method
h0 ,
0.0230769 reduces to the two statements
from which the next Newton iterate is h = - Df(x) \ f(x)
  
0.8 0.0230769
 
0.8230769
 x = x + h
x1 = x0 + h0 = + = .
1.8 0.0230769 1.8230769 These statements can in fact be written in a single line as
To start the next Newton iteration, we compute the vector f (x1 ) and the h = - Df(x) \ f(x), x = x + h
Jacobian matrix Df (x1 ) as
    The Newton process leading to the solution of the system (6) may thus be
1.0000000 1.0000000 0.0000000 realised as
Df (x1 ) = , f (x1 ) = .
1.6461538 3.6461538 0.0010651
f = @(x) [x(1) - x(2) + 1; x(1)^2 + x(2)^2 - 4];
The second displacement vector, h1 , is then the solution to the linear system Df = @(x) [1, -1; 2*x(1), 2*x(2)];
   
1.0000000 1.0000000 0.0000000
h1 = . (8) x = [0.8; 1.8]; % initial value
1.6461538 3.6461538 0.0010651
| {z } h = - Df(x) \ f(x), x = x + h
=f (x1 ) h = - Df(x) \ f(x), x = x + h
h = - Df(x) \ f(x), x = x + h
We then find that
h = - Df(x) \ f(x), x = x + h
2.0125224 104
   
0.8228757
h1 = , x2 = x1 + h1 = . and we observe that in the last step h = [0; 0] meaning convergence to the
2.0125224 104 1.8228757
exact solution in four steps when starting from
Repeating this proces once more we find the final iterate  
0.8
x0 = .
0.82287565553230 1.8
x3 =
1.82287565553230

which is correct to 16 decimal digits as you may check from the exact solution

1
x1 7 1
= r = 2  .
1
x2 2 7 + 1

Implementing Newtons method in MATLAB


Solving all the linear systems needed in Newtons method, not to mention ac-
tually computing the matrices and right hand sides in the first place, quickly

7 8

You might also like