You are on page 1of 17

35241 OQM

Lecture Note – Part 9


Constrained Nonlinear Programming

1 Introduction

In this subject, we consider the constrained NLP with the linear


constraints only. This simplifies the discussion greatly, but does
not lose much generality. Most methods for solving optimisation
problems with nonlinear constraints transform the constraints into
the linearised versions, in one way or another, even though it could
fail.

2 Linear-Equality Constrained NLP


Consider a minimisation NLP problem with only linear equality
constraints:
min z = f (x)
(1)
s.t. Ax = b

where A is an m × n matrix, n > m, rank(A) = m, x ∈ Rn ,


and b ∈ Rm. We also assume that the nonlinear function f (x) is
defined and has continuous second-order partial derivatives over
the feasible region D = {x : Ax = b}.

1
Since f (x) is a nonlinear function, its minimum may occur any-
where within the feasible region D instead of the extreme points.
Hence we cannot use any technique similar to the Simplex Method.

2.1 Reduced Function

One of the proposed solution approaches is to transform the con-


sidered constrained NLP to an unconstrained NLP by virtue of a
reduced function.

Example 1. Consider the following problem

min f (x1 , x2 ) = x21 + x22


s.t. 3x1 + 2x2 = 6

We can transform it to an unconstrained version by rearranging


the constraint
3
x2 = 3 − x1 ,
2
and substituting it for x2 in the objective function f (x1 , x2 ). Then
the considered constrained NLP problem with two decision vari-
ables is reduced to an equivalent unconstrained NLP problem with
one decision variable
3 2
min φ(x1 ) = x21 + 3 − x1 .
2
Finding the stationary point and applying the optimality condi-
tions for unconstrained NLP, we have an optimal solution (x1, x2 )T =
( 18 12 36
13 , 13 ) with the optimal objective value min f (x1 , x2 ) = 13 ≈
2.76923.

2
In general case, we transform the constrained NLP (1) to an un-
constrained version by means of a general solution to the system
of linear equations
Ax = b. (2)
Since the system of constraints (2) is the same as what we have
in LP, the similar skill can be utilised. We first divide the n-
dimensional decision-variable vector x into an m-dimensional ba-
sis xB and an (n − m)-dimensional nonbasis xN , and correspond-
ingly the m × n constraint matrix A into an m × m basic matrix
B and an m × (n − m) nonbasic matrix N. Assume without loss
of generality that the first m columns of A give the basic matrix
B. Then the vector equation (2) can be rewritten in the form

BxB + NxN = b.

So, the basis can be represented in term of the nonbasis as follows

xB = B−1 (b − NxN ) = B−1 b − B−1 NxN .

Hence, the general solution of (2) is


! ! ! !
−1 −1 −1 −1
xB B b − B NxN B b −B N
x= = = + xN ,
xN xN 0 I
(3)
where ! !
B−1 b −B−1 N
= x and = Z.
0 I
Then the general solution (3) can be rewritten in the form

x = x + ZxN , (4)

3
where x is a particular solution of (2) by setting xN = 0, i.e.
Ax = b.
The matrix Z is called a basis matrix for the null-space of A.1
By replacing (4) into the objective function f (x), the constrained
NLP (1) is transformed into the unconstrained NLP
min z = f (x + ZxN ) = φ(xN )
Function φ(xN ) of (n − m)-dimensional vector xN is called the
reduced function of the constrained NLP problem (1).

Example 2. Consider the following constrained NLP


min f (x1 , x2, x3 ) = x21 + 4x1x3 − x22
s.t. 2x1 + x2 + 4x3 = 5 (i)
3x1 + x2 − x3 = 1 (ii)
The feasible set of the problem is the set of solutions of the system
of linear equations (i) and (ii).
Applying Gauss-Jordan elimination, we obtain a general solution
in the form
     
x1 −4 5
x = x2 =  13  + −14 x3 .
     

x3 0 1

This is equivalent to selecting xN = x3 and xB = (x1 , x2)T , which


gives matrices
! !
2 1 −1 1
B= ⇒ B−1 = ,
3 1 3 −2
1
The null-space of Am×n is the set of all x ∈ Rn giving Ax = 0, which is a (n − m)-
dimensional subspace of vector space Rn . The set of columns of Z gives a basis for the
null-space of A.

4
 
! ! ! −4
−1 1 5 −4
B−1 b = = ⇒ x =  13 ,
 
3 −2 1 13
0
and
 
! ! ! 5
4 −5 −B−1 N
N= ⇒ B−1 N = ⇒ Z= = −14 .
 
−1 14 I
1

Hence, the feasible set of the problem is the set of vectors in the
form    
−4 5
x = x + Zx3 =  13  + −14 x3 ,
   

0 1
which is indeed identical to the set obtained by solving the system
of equations directly.
Hence the reduced function of the problem is

φ(x3 ) = f ((−4, 13, 0)T +(5, −14, 1)T x3) = f (5x3 −4, −14x3 +13, x3)

= (5x3 − 4)2 + 4(5x3 − 4)x3 − (−14x3 + 13)2 .


Again, we can solve the unconstrained NLP problem min φ(x3 )
by finding the stationary point and applying the optimality con-
ditions for unconstrained NLP.

2.2 Optimality Conditions

We aim to solve the unconstrained NLP problem with a reduced


function
min φ(xN ),

5
where φ(xN ) = f (x + ZxN ), by setting up some optimality nec-
essary conditions and sufficient conditions.
Applying the chain rule for differentiation gives the gradient of
φ(xN )
∇φ(xN ) = ZT ∇f (x),
and the Hessian matrix of φ(xN )

∇2 φ(xN ) = ZT ∇2 f (x)Z.

Then Theorems 5 and 6 in Lecture Note – Part 8 imply the fol-


lowing results.

Theorem 1. (Second-order necessary condition – Linear


equality constraints)
If x∗ a local minimiser of f (x) over the set {x : Ax = b}, and Z
is a basis matrix for the null-space of A, then
(i) ZT ∇f (x∗ ) = 0, and

(ii) ZT ∇2 f (x∗ )Z is positive semidefinite.

Theorem 2. (Second-order sufficient condition – Linear


equality constraints)
If Z is a basic matrix for the null-space of A and the point x∗
satisfies
(i) Ax∗ = b,

(ii) ZT ∇f (x∗ ) = 0, and

(iii) ZT ∇2 f (x∗ )Z is positive definite,

6
then x∗ a local minimiser of f (x) over the set {x : Ax = b}.
Notice that given a point x for a considered linear equality con-
strained NLP problem we can apply directly the above two theo-
rems without deriving a reduced function.

Example 3. Consider the problem

min f (x1 , x2 , x3 ) = x21 − 2x1 + x22 − x23 + 4x3


s.t. x1 − x2 + 2x3 = 2

We have
   
2x1 − 2 2 0 0
2
∇f (x) =  2x2  and ∇ f (x) = 0 2 0  .
   

−2x3 + 4 0 0 −2

To find a basis matrix Z for the null-space of A1×3 = (1, −1, 2),
we choose xN = (x2, x3 )T and xB = x1 . Thus, we have N =
(−1, 2) and B = 1. Hence, we have
 
! 1 −2
−B−1 N
Z= = 1 0  .
 
I
0 1

Considering point x∗ = (2.5, −1.5, −1)T , we have

Ax∗ = 2.5 − (−1.5) + 2(−1) = 2 = b,


 
! 3 !
1 1 0   0
ZT ∇f (x∗ ) = −3 = ,
−2 0 1 0
6

7
  
2 0 0
! 1 −2 !
1 1 0  4 −4
ZT ∇2 f (x∗ )Z = 0 2 0  1 0  = .
 
−2 0 1 −4 6
0 0 −2 0 1
Since the eigenvalues of the Hessian matrix ZT ∇2 f (x∗ )Z are µ =

5 ± 17 > 0, ZT ∇2 f (x∗ )Z is positive definite.2 The second-order
sufficient conditions are satisfied, so x∗ = (2.5, −1.5, −1)T is a
local minimiser of f (x) over the feasible region.

2.3 The Lagrangian Function

Another technique to transform a constrained NLP to an uncon-


strained NLP is to introduce the Lagrangian function. Consider
a minimisation constrained NLP problem
min z = f (x)
(5)
s.t. gi (x) = bi , i = 1, . . . , m.
We introduce a function called the Lagrangian, which is designed
by associating a multiplier λi called Lagrange multiplier3 with the
i-th constraint, i = 1, . . . , m in (5), as shown below.
Xm

L(x, Λ) = f (x) + λi bi − gi (x) (6)
i=1
Then we attempt to find a point (x∗ , Λ∗ ), where Λ∗ = (λ∗1 , λ∗2 ,
. . . , λ∗m ) ∈ Rm, so as to minimise L(x, Λ). Now we explain that
in most situations x∗ will solve (5). If (x∗ , Λ∗ ) minimises L(x, Λ),
then at (x∗ , Λ∗ )
∂L(x, Λ)
= bi − gi (x) = 0, i = 1, . . . , m. (7)
∂λi
2
Notice that the Hessian matrix ∇2 f (x∗ ) itself is not positive definite.
3
The Lagrange multipliers are closely related to the dual variables in duality theorem.

8
This shows that x∗ will satisfy the constraints in (5) and thus be
feasible. To show x∗ is the optimal solution of (5), we let x′ be
any feasible solution to (5). Since (x∗ , Λ∗ ) minimises L(x, Λ), for
any number Λ′ we have

L(x∗, Λ∗ ) ≤ L(x′ , Λ′ ). (8)

Since x∗ and x′ are both feasible in (5), the terms in (6) involving
the λ’s are all zeros, and (8) becomes f (x∗ ) ≤ f (x′ ). Thus x∗
does solve (5). In short, if (x∗ , Λ∗ ) solves the unconstrained NLP
problem
min L(x, Λ), (9)
then x∗ solves the constrained NLP problem (5).
From Theorem 5 in Lecture Note – Part 8, we know that for
(x∗ , Λ∗ ) to solve (9), it is necessary that at (x∗ , Λ∗ ),

∇L(x, Λ) = 0. (10)

The following theorem gives conditions implying that any point


(x∗ , Λ∗ ) that satisfies (10) will yield an optimal solution x∗ to (5).

Theorem 3. If f (x) is a convex function and each gi (x) is a lin-


ear function, then any point (x∗ , Λ∗ ) satisfying (10) will yield an
optimal solution x∗ for (5).4

Equations (10) represent ∇x L(x, Λ) = 0 and ∇Λ L(x, Λ) = 0.


4
Notice that even if the hypotheses fail to hold, it is still possible that any point
satisfying (9) will solve (5).

9
∇Λ L(x, Λ) = 0 is exactly (7) and ∇x L(x, Λ) = 0 indicates
m
X
∇f (x) = λi ∇gi (x),
i=1

which involves a geometrical interpretation of Lagrange multipli-


ers. For x∗ to solve (5), it is necessary that ∇f (x) is a linear
combination of the constraint gradients ∇gi (x), i = 1, . . . , m.

Example 4. Consider the problem in Example 3. The La-


grangian of the constrained NLP is

L(x1, x2 , x3, λ) = x21 − 2x1 + x22 − x23 + 4x3 − λ(2 − x1 + x2 − 2x3 ).

Then we have


 2x1 − 2 − λ = 0 (11)

 2x2 + λ = 0 (12)
∇L(x1 , x2, x3 , λ) = 0 ⇒

 −2x3 + 4 − 2λ = 0 (13)

x1 − x2 + 2x3 − 2 = 0 (14)

From (11), (12), and (13) we have



λ+2
 x1 = 2
 (15)
x2 = − λ2 (16)

x3 = −λ + 2 (17)

Substituting (15), (16) and (17) into (14) gives


λ+2 λ
+ + 2(−λ + 2) − 2 = −λ + 3 = 0 ⇒ λ = 3.
2 2
Bringing this value into (15), (16) and (17) gives us the solution

(x1 , x2 , x3)T = (2.5, −1.5, −1)

10
which we had identified as a local minimiser of f (x1 , x2 , x3 ) over
the feasible region in Example 3.
In fact, the Lagrangian is especially useful when we consider
the inequality-constrained NLP.

3 Linear-Inequality Constrained NLP


This section introduces the optimality conditions for the inequal-
ity constrained NLP problem

min z = f (x)
(18)
s.t. Ax ≥ b

where A is an m × n matrix. Assume that the considered problem


is feasible. The Lagrangian of the problem is defined by

L(x, Λ) = f (x) − ΛT (Ax − b),

where Λ = (λ1 , λ2 , . . . , λm ) is the vector of Lagrange multipliers.

Theorem 4. (Second-order necessary condition – Linear


inequality constraints)
Assume that x∗ is a local minimiser of f (x) over the set {x : Ax ≥
Pn
b} ⊆ Rn . A single constraint j=1 aij xj ≥ bi of the problem is
“active” or “binding” for x∗ if the value x∗ makes it an equality,
Pn ∗
i.e. j=1 aij xj = bi . Let Z+ be a basis matrix for the null space
of the set of active constraints for x∗. Then there exists a vector
Λ∗ ∈ Rm which satisfying

(i) ∇xL(x∗ , Λ∗ ) = 0, i.e. ∇f (x∗ ) = AT Λ∗ ,

11
(ii) Λ∗ ≥ 0,

(iii) λ∗i (Ax∗ − b)i = 0, and

(iv) (Z+ )T ∇2 f (x∗ )Z+ is positive semidefinite.

Theorem 5. (Second-order sufficient condition – Linear


inequality constraints)
If Z+ is a basis matrix for the null space of the set of active
constraints for x∗ ∈ {x : Ax ≥ b} ⊆ Rn and there exists a vector
Λ∗ ∈ Rm satisfying

(i) ∇xL(x∗ , Λ∗ ) = 0, i.e. ∇f (x∗ ) = AT Λ∗ ,

(ii) Λ∗ ≥ 0,

(iii) λ∗i (Ax∗ − b)i = 0, and

(iv) (Z+ )T ∇2 f (x∗ )Z+ is positive definite,

then x∗ is a local minimiser of f (x) over the set {x : Ax ≥ b} ⊆


Rn .
Example 5. Consider the problem

min f (x1 , x2) = −2x21 + x1 x2 + x22 − 5x1 − x2


s.t. x1 + x2 ≤ 1
x2 ≥ 0

We first rearrange the constraints so that we can get the signs of

12
the Lagrange multipliers right.5 So we look to solve
min f (x1 , x2) = −2x21 + x1 x2 + x22 − 5x1 − x2
s.t. −x1 − x2 ≥ −1
x2 ≥ 0
Then we have
! !
−1 −1 −1 0
A= ⇒ AT = .
0 1 −1 1
Let Λ = (λ1 , λ2 )T be the vector of Lagrange multipliers associated
with the two constraints. The Lagrangian of the problem is
L(x, Λ) = −2x21 + x1 x2 + x22 − 5x1 − x2 − λ1 (−x1 − x2 + 1) − λ2 x2 .
Then by virtue of the feasibility condition and some simple nec-
essary conditions for a local minimum as follows:
• −x1 − x2 ≥ −1, x2 ≥ 0,
! !
−4x1 + x2 − 5 −λ1
(i) ∇f (x) = = ,
x1 + 2x2 − 1 −λ1 + λ2
(ii) λ1 ≥ 0, λ2 ≥ 0, and

(iii) λ1 (−x1 − x2 + 1) = 0, λ2 x2 = 0,
we can analyse the following four possible cases to obtain the
solution.

Case 1. Only the first constraint is active and thus λ2 = 0. Then


 
5

 x1 + x2 = 1  x1 = − 4

9
−4x1 + x2 − 5 = −λ1 ⇒ x2 = 4
λ1 = − 49
 
x1 + 2x2 − 1 = −λ1
 
5
Notice that this is similar to the normal form in LPs.

13
Since λ1 = − 94 < 0, the necessary conditions do not hold.

Case 2. Only the second constraint is active and thus λ1 = 0.


Then ( (
−4x1 − 5 = 0 x1 = − 54

x1 − 1 = λ2 λ2 = − 49
Since λ2 = − 94 < 0, the necessary conditions do not hold.

Case 3. Both constraints are not active and thus λ1 = λ2 = 0.


Then ( (
−4x1 + x2 − 5 = 0 x1 = −1

x1 + 2x2 − 1 = 0 x2 = 1
To check whether this point is a local minimum, we can apply the
second-order sufficient condition. We have
!
−4 1
∇2f (−1, 1) = ,
1 2

and the eigenvalues of the Hessian matrix can be found by solving



−4 − µ 1 √
= µ2 + 2µ − 9 = 0 ⇒ µ1,2 = −1 ± 10.


1 2 − µ

Since µ2 = −1 − 10 < 0, the Hessian matrix is not positive
definite. Thus, this point is not a local minimum.

Case 4. Both constraints are active. Then there is only one


feasible point x = (1, 0)T , and f (1, 0) = −7. The corresponding
Lagrange multipliers are µ1 = 9 > 0, µ2 = 9 > 0. So x = (1, 0)T
is a local minimum for f (x) over the given feasible region.

14
These sorts of problems are often solved with similar ways. The
modified Newton’s method, which can efficiently solve the con-
strained NLP problem, will be introduced in the advanced course
“Nonlinear Methods in Quantitative Management”.

3.1 Duality of NLP

As in LP, there is a form of duality for NLP problems. This


is closely related to the Lagrangian function (and is part of the
reason for using the Lagrangian). We will not go into it in detail in
this course, but instead demonstrate how the Lagrange multipliers
relate to dual variables in the LP case. Note that LP is just a
special case of NLP.
Consider the LP in normal form
min z = 3x1 + x2
s.t. 2x1 + 5x2 ≥ 7
x1 + 4x2 ≥ 2
x1 ≥ 0
x2 ≥ 0
Notice that typically we do not have sign restrictions (the last
two) in NLP problems so we include them specifically.
The dual LP is
max w = 7y1 + 2y2
s.t. 2y1 + y2 ≤ 3
5y1 + 4y2 ≤ 1
y1 ≥ 0
y2 ≥ 0

15
We can treat the primal LP as a general minimisation problem
and form the following Lagrangian
 
2x1 + 5x2 − 7
 x + 4x − 2 
 1 2
L(x, Λ) = 3x1 + x2 − (λ1 , λ2 , λ3 , λ4 ) 


 x1 − 0 
x2 − 0
The first-order optimality conditions are
! !
3 − 2λ1 − λ2 − λ3 0
(i) = ,
1 − 5λ1 − 4λ2 − λ4 0
(ii) Λ ≥ 0, and

(iii) λi (Ax − b)i = 0, i = 1, . . . , 4,


   
2 5 7
 1 4  2
where A =   and b =  .
   
 1 0  0
0 1 0
Notice that in (i) the multipliers λ3 ≥ 0 and λ4 ≥ 0 can be re-
garded as slack variable. Then the two equations can be rewritten
as (
2λ1 + λ2 ≤ 3
(19)
5λ1 + 4λ2 ≤ 1,
which are exactly the two constraints of the dual LP. So if there
are dual variables satisfying the constraints (19) and (ii), and the
complementary slackness constraints (iii) with x = x∗ , then x∗ is
an optimal solution to the primal LP.
Thus, the dual variables can also be thought of as Lagrange mul-
tipliers on the constraints of an LP problem. If a constraint is not

16
active, the corresponding optimal dual variable must be zero; if a
constraint is active, then the corresponding optimal dual variable
can be any nonnegative number.

Further reading: Section 11.8–11.9 in the reference book “Operations Research: Ap-
plications and Algorithms” (Winston, 2004)

17

You might also like