Robust Portfolio Optimization With A Hybrid Heuristic Algorithm

Comput Manag Sci (2012) 9:6388
DOI 10.1007/s10287-010-0127-2
ORIGINAL PAPER
Robust portfolio optimization with a hybrid heuristic
algorithm
Bjrn Fastrich Peter Winker
Received: 24 September 2009 / Accepted: 8 December 2010 / Published online: 25 December 2010
Springer-Verlag 2010
Abstract Estimation errors in both the expected returns and the covariance ma-
trix hamper the construction of reliable portfolios within the Markowitz framework.
Robust techniques that incorporate the uncertainty about the unknown parameters are
suggested in the literature. We propose a modication as well as an extension of such
a technique and compare both with another robust approach. In order to eliminate
oversimplications of Markowitz portfolio theory, we generalize the optimization
framework to better emulate a more realistic investment environment. Because the
adjusted optimization problemis no longer solvable with standard algorithms, we em-
ploy a hybrid heuristic to tackle this problem. Our empirical analysis is conducted with
a moving time window for returns of the German stock index DAX100. The results
of all three robust approaches yield more stable portfolio compositions than those of
the original Markowitz framework. Moreover, the out-of-sample risk of the robust
approaches is lower and less volatile while their returns are not necessarily smaller.
Keywords Hybrid heuristic Algorithm Markowitz Robust optimization
Uncertainty sets
B. Fastrich P. Winker (B)
Department of Statistics and Econometrics, University of Giessen,
Licher Strasse 64, 35394 Giessen, Germany
e-mail: Peter.Winker@wirtschaft.uni-giessen.de
B. Fastrich
e-mail: Bjoern.Fastrich@wirtschaft.uni-giessen.de
P. Winker
Center for European Economic Research (ZEW),
L7,1, 68161 Mannheim, Germany
123
64 B. Fastrich, P. Winker
1 Introduction
An investors primary objective is to optimally allocate his nancial resources among
a given set of assets. This allocation is traditionally modeled following Markowitz
(1952), the rst to explicitly quantify the trade-off between risk and return within the
process of portfolio selection. The model assumes a market of K assets with multi-
variate normally distributed expected returns, given in the (K 1)-vector , and a
(K K)-covariance matrix of returns . Efcient portfolios can be constructed if the
weights w
i
of the assets i = 1, . . . , K, are chosen such that the following problem is
solved:
max
w
(1 )w

_
w
w (1)
subject to
w
l
K
= 1
where l
K
= (1, . . . , 1)
and w = (w
1
, . . . , w
K
)
. The weighting parameter [0, 1]

can be interpreted as a risk aversion parameter, since it takes into account the trade-off
between risk (measured by
p
=
w) and return of the portfolios. Repeatedly

solving (1) for several values [0, 1] yields the efcient frontier. This framework
is called the Mean-Variance-Optimization (MVO) approach.
In order to eliminate certain oversimplications of Markowitz portfolio theory,
the framework can be adjusted by side conditions to better emulate a more realistic
investment environment. Hence, in this paper, the asset weights are restricted to be
only within a lower (w
l
0) and an upper (w
u
1) bound, meaning that short-selling
is prohibited. Furthermore, following, e.g., Maringer (2005), the limited divisibility of
assets is considered as well as the fact that investors must pay transaction costs. The
former ensures the minimum fraction of the investors capital endowment V being
invested in asset i with price P
i
is P
i
/V, corresponding to one piece of asset i . These
constraints lead to discrete weights w
i
= n
i
P
i
/V, where n
i
N
+
0
. The transaction
costs are modeled as a composite of a xed payment c
f
per asset traded plus a fraction
c
v
of the traded volume |n
i
| P
i
. The term |n
i
| gives the absolute number of pieces
of asset i that were traded to obtain the current portfolio by adjusting the previous one
(see, e.g., Guastaroba et al. 2009). As a result of the transaction costs and the inte-
ger constraints, it is likely that a portfolio p cannot hold the investors entire capital
endowment in assets; the remainder R
p
will be held in cash. Moreover, it appears to
be unrealistic to hold as many different assets as possible in order to diversify the risk
of the portfolio. Portfolios with a very large number of different assets may become
impractical to handle for administrative reasons (Maringer 2005). In order to take this
issue into account, a limit K
max
< K on the number of different assets held in a port-
folio p is introduced (Chang et al. 2000). Consequently, the subset I
p
{1, . . . , K},
which contains the constituents indices i of portfolio p, consists of at most K
max
elements. The optimization is conducted over the set X, which contains all possible
subsets I
p
, where #I
p
K
max
. The constituents indices of the portfolio held in the
123
Robust portfolio optimization with a hybrid heuristic algorithm 65
previous period, i.e. before rebalancing, is denoted by I
prev
. Hence, the optimization
problem becomes
1
:
max
I
p
X, n
p
(N
+
0
)
#I
p
(1 )
i I
p
n
i
P
i
(1 +
i
)
i I
p
I
prev C
i
_
+ R
p
V
1
__
w
p
w
p
_
(2)
subject to
X = {I
p
{1, . . . , K} | #I
p
K
max
} set of feasible asset combinations
w
i
=
n
i
P
i
V
i I
p
discrete portfolio weights
C
i
= c
f
z
i
+ c
v
|n
i
| P
i
i I
p
I
prev
transaction costs
z
i
=
_
1 if |n
i
| > 0 i I
p
I
prev
0 otherwise
indicator variable
R
p
= V
i I
p
n
i
P
i

i I
p
I
prev C
i
residual cash holdings
w
l
w
i
w
u
i I
p
no short-selling
Due to its complexity, a discrete search space, and multiple optima, problem (2)
will be optimized with a modied hybrid heuristic algorithm (HHA) introduced by
Maringer and Kellerer (2003). Heuristic algorithms have been employed in the eld
of portfolio optimization since Dueck and Winker (1992). An overview of heuristic
optimization techniques is provided by Gilli and Winker (2009) and for applications
in nance by Maringer and Winker (2007).
In order to solve either of the problems with real world data, estimators
i
and

for the true (but unknown) parameters

i
and must be used. Estimation errors will
inevitably occur that coincide with a high sensitivity to changes in the input param-
eters of MVO-portfolios. Michaud (1989) argues that Markowitz portfolio theory is
error-maximizing, because those assets in a portfolio that get overweighted (under-
weighted) have the largest (smallest) expected return-variance-ratios, and that these
exact assets also exhibit the highest probability of large (small) estimation errors. There
is a vast literature addressing this problem ranging from restricting portfolio weights
(e.g. Frost and Savarino 1988) to Bayesian approaches (e.g. Black and Litterman 1991)
and resampling methods (e.g. Michaud 1998). More recently proposed methods can
be categorized as robust estimation and robust optimization approaches.
Robust estimation focuses on robust statistical procedures to estimate and

,
which are less sensitive to outliers and might result in more robust portfolios. While
Maronna et al. (2006) provide an overview of robust estimators, the contributions by,
e.g., Lauprete et al. (2002); Perret-Gentil and Victoria-Feser (2004); Welsch and Zhou
(2007); Genton and Ronchetti (2008); DeMiguel and Nogales (2009), and Winker
et al. (2011) belong to this class of robust estimation approaches.
1
The cardinality constraint implies that vectors and matrices denoted with an index p correspond to the
asset indices in I
p
and, consequently, are of dimensionality K
p
:= #I
p
K
max
.
123
In contrast to robust estimation, the idea of robust optimization is to explicitly incor-
porate the uncertainty about the parameters into the optimization process. Therefore,
instead of considering a single point estimate, uncertainty sets are used that contain
a certain selection of point estimates. With the objective of constructing a portfolio
that exhibits good characteristics for many possible scenarios, the portfolio is opti-
mized under the worst-case-scenario for the given uncertainty set. The articles by, e.g.,
Goldfarb and Iyengar (2003); Ttnc and Koenig (2004); Ceria and Stubbs (2006);
Bertsimas and Pachamanova (2008), and Zymler et al. (2009) apply this idea in dif-
ferent ways. Fabozzi et al. (2007) provide a comprehensive overview. In this paper,
we enrich this choice of robust optimization techniques with an approach that extends
the technique of Ceria and Stubbs (2006) by an uncertainty set for the portfolio risk.
Moreover, we propose rened proceedings for the computation of uncertainty sets.
Our results are compared empirically to those obtained by the approaches of Ttnc
and Koenig (2004) and Ceria and Stubbs (2006). To solve optimization problem (2)
for these robust approaches the aforementioned HHA is employed.
The rest of the paper is structured as follows: Chapter 2 describes the optimization
algorithm as well as the implemented robust techniques. Chapter 3 presents computa-
tional results using daily German stock returns as well as a convergence analysis for
the HHA. Finally, Chapter 4 concludes.
2 The implemented techniques
As previously mentioned, the methods to construct uncertainty sets vary over different
robust approaches. However, as this paper employs the optimizer HHA, it is explained
in detail rst. Afterwards the robust techniques and their combination with the HHA
are addressed.
2.1 The optimizer: hybrid heuristic algorithm
Maringer and Kellerer (2003) introduced a novel heuristic in a successful application
to a portfolio selection problem. The authors intent was to overcome certain short-
comings of local search algorithms such as Simulated Annealing (SA) (Kirkpatrick
et al. 1983). While the shortcomings are the dependence on the single search agents
starting point and the rapidly increasing peril of getting stuck in local optima when
the problem complexity increases, these algorithms also exhibit the advantage of a
relatively precise search within a predened (local) neighborhood. Population based
methods, such as genetic algorithms, work with multiple search agents that constitute
a greater global search potential and consequently a greater capacity to cope with
rather complex problems. In order to include the advantages of both local and popu-
lation based methods, the proposed hybrid algorithm of Maringer and Kellerer (2003)
has (1) more than one starting point and (2) an entire population of communicat-
ing search agents. However, a precise search is preserved through (3) embedding an
SA-algorithm.
After initializing a population of Pop search agents I
p
, p = 1, . . . , Pop, that repre-
sent portfolios and are termed individuals, the three phases Modication, Evaluation
123
and Replacement are repeated over a predened number of generations g. In the Mod-
ication Phase an SA-algorithm is conducted independently for all individuals and
the Evaluation Phase ranks these individuals according to their objective function val-
ues. In the Replacement Phase the worst individuals are replaced either by so-called
Clones, i.e., exact copies of the current populations best individuals, or by so-called
Averaged Idols, i.e., individuals that combine characteristics which have been proven
to be successful in other individuals. In contrast to, e.g., genetic algorithms, experience
does not only get passed on by two parents, but by an entire group of successful indi-
viduals. Moreover, in combination with the SA-algorithm in the Modication Phase,
this arrangement of the phases helps to nd a successful combination of assets, i.e.,
a core structure, in earlier generations, before it then contributes to assigning proper
portfolio weights to this core structures assets.
The hybrid heuristic algorithm (HHA) that we employ in this paper is based on the
algorithm by Maringer and Kellerer (2003). Our advancements modify primarily two
aspects. Firstly, as the embedded local search strategy we use a Threshold Accept-
ing (TA) algorithm (Dueck and Scheuer 1990). Therefore, impairments in the objec-
tive function are accepted deterministically instead of stochastically. Secondly, the
Replacement Phase is written in such a way that the worst individuals only get re-
placed if the opponents objective function values are not worse by more than the
current threshold value T
t
.
2
This additional application of the TA-acceptance rule
only becomes relevant if an unsuccessful individual is ought to be replaced by an
Averaged Idol.
3
The pseudo code for the HHA is given in Algorithm 1. In generation g =0,
the algorithm generates and evaluates a population of Pop random solutions
I
0
p
, p =1, . . . , Pop, that satisfy the constraints (2: to 3:). The solutions I
0
p
are not only
component wise (stepwise) altered but are also subject to evolutionary procedures.
For each step size U
t
and threshold value T
t
respectively, where t = 1, . . . , t hresh,
a number of i t er generations evolve. Similar to the TA in Gilli and Kllezi (2002)
the step size U
t
is determined deterministically. Starting with a maximum value of
U
max
it linearly decreases by increments of U when t increases and the algorithm
matures, respectively, until it eventually reaches U
mi n
(5:) (but is held constant for
i t er generations for xed t ). U
t
can be interpreted as a fraction of the total capital
endowment V that is subject to trades between two evolutionary steps. More pre-
cisely, these trades, which are conducted within the Modication Phase, adjust the
individuals in a component wise manner as it is known from TA. Each generation
g = (t 1) i t er +l, g = 0, . . . , t hresh i t er, undergoes two more phases, namely a
Evaluation Phase and a Replacement Phase, which rank and recombine the individuals
as it is known from evolutionary algorithms.
2
For the problemat hand, empirical tests indicate that the additional application of the TA-acceptance rule
in the Replacement Phase leads to superior results compared to both versions with a certain replacement
and versions that only replace individuals in the case of an improvement (results not reported, but available
on request).
3
The alternative to an Averaged Idol, i.e. a Clone of one of the most successful individuals, definitely
exhibits a better objective function value and replaces the unsuccessful solution with certainty.
123
Algorithm 1 Hybrid Heuristic Algorithm.
1: Initialize Pop, t hresh, i t er, U
mi n
, U
max
and U = (U
max
U
mi n
)/t hresh
2: Generate a valid initial (g = 0) population of random solutions {I
0
p
}, p = 1, . . . , Pop
3: Evaluate F(I
0
p
) p and determine elitist I
0
4: for t = 1 to t hresh do
5: Determine the step size U
t
= U
max
U (t 1)
6: for l = 1 to i t er do
7: g = (t 1) i t er +l
8: Modication Phase (Algorithm 2)
9: Evaluation Phase (Algorithm 3)
10: Replacement Phase (Algorithm 4)
11: end for
12: end for
13: terminate algorithm and report current elitist I
In the Modication Phase, shown in more detail in Algorithm 2, the agents develop
independently from each other over st eps search steps. In each of these steps, an
amount of n
i, p
pieces of a randomly picked asset i I
g
p
, amounting to the step
sizes cash equivalent, is sold (3:). Of course, n
i, p
= h() is a (discrete) func-
tion of various parameters of the problem. If the disposition leads to a drop of the
assets weight below the lower bound, or if it leads to a complete clearance of asset
i I
g
p
, two proceedings exist: (1) with probability p
replace
asset i gets substituted
by a random asset k / I
g
p
, or (2) with probability 1 p
replace
asset i is kept with
quantity n
i, p
= w
i, p
= 0 and a random other portfolio constituent j I
g
p
, j = i ,
is bought (4: to 6:). In contrast, if the disposition does not violate the lower bound on
the weights, i is kept in the reduced (but still positive) quantity and, again, a random
other portfolio constituent j I
g
p
, j = i , is bought (7: to 9:). Under consideration
of all constraints, the assets j and k respectively are bought from the cash that was
generated by selling asset i . Hence, an altered version of portfolio I
g
p
denoted by I
g
p
is generated. This new solution has to be evaluated (10:) and compared to the current
solution I
g
p
. Whenever the objective function value (the tness) is greater than that of
Algorithm 2 Modication Phase (HHA).
1: Initialize p
replace
{T
t
},; import {I
g
p
}, V, U
t
, P
2: for s = 1 to st eps (parallel p) do
3: Sell n
i, p
= h(U
t
, V, P
i, p
. . .) pieces of a random assets i I
4: if n
i, p
= 0 w
i, p
< w
l
then
5: With p
replace
: buy random asset k / I
g
p
versus i I
g
p
6: With 1 p
replace
: buy a random asset j I
g
p
, j = i , i I
g
p
n
i, p
= w
i, p
= 0
7: else
8: Buy random asset j I
g
p
, j = i
9: end if
10: Evaluate F(I
g
p
)
11: if F(I
g
p
) [F(I
g
p
) T
t
] then
12: I
g
p
= I
g
p
& update I
g
if necessary
13: end if
14: end for
123
the current solution I
g
p
, or whenever the impairment is not greater than the threshold
value T
t
, the solution I
g
p
is accepted as the new current solution I
g
p
(11: to 12:). This
possibility to accept impairments in the objective function value ultimately enables
the agents to escape local extrema. Hence, the threshold value can be interpreted as
the tolerance towards impairments; its value decreases with an increasing t to a nal
value of T
t hresh
= 0. If an accepted solutions objective function value is also larger
than that of the best portfolio found, the so-called elitist I
g
is updated (12:).
After the individuals developed independently from each other, the Evaluation
Phase, which is presented in Algorithm 3, applies. In that, the Pop individuals get
sorted in an descending order according to their objective function values (2:). A
selection of promising tendencies that will be reinforced as well as the selection of
unpromising tendencies that might be excluded, i.e., the evolutionary component of the
HHA, relies on this order as follows. The < 0.5Pop best individuals, the so-called
prodigies I
g
x
{I
g
p
}, x = 1, . . . , , are dened to be idols for the remaining pop-
ulation(3:).
4
This groupof idols, denotedby
g
, is enlargedbythe current generations
elitist I
g
. Hence, the set of idols is dened by

g
= {I
g
x
, I
g
} (4:). Based on their

ranks, the prodigies portfolios are assigned linearly decreasing amplifying factors
a
g
x
, ranging from+1 down to 1. The elitists amplifying factor is chosen to be (5:).
Corresponding to
g
the worst individuals, subsequently called underdogs, are
pooled in set
g
= {I
g
<Pop(x1)>
} {I
g
p
} (6:).
Algorithm 3 Evaluation Phase (HHA).
1: Initialize , ; import {I
g
p
}, Pop
2: Rank individuals according to their tness
3: Dene prodigies I
g
<x>
{I
g
p
}, x = 1, . . . ,
4: Enlarge the set of idols by the elitist, i.e.,
g
= {I
g
<x>
, I
g
}
5: Assign the prodigies linearly decreasing amplifying factors a
g
<x>
, ranging from + 1 down to 1; the
elitists factor is
6: Dene underdogs I
g
<Pop(x1)>
{I
g
p
}, merged in the set
g
The last phase in the life of a populations generation is the Replacement Phase,
shown in Algorithm 4, in which the set of idols, in cooperation with the amplifying
factors, is used to (possibly) replace the underdogs I
g
<Pop(x1)>
.
An underdog is replaced with probability p
clone
by an exact copy, i.e., by a Clone,
of a prodigy. Therefore, each prodigy within
g
gets assigned a selection probability
p(I
g
<x>
), that increases with the prodigys tness, i.e.,
p(I
g
<x>
) =
F(I
g
<x>
)a
g
<x>
x=1
F(I
g
<x>
)a
g
<x>
, x (3)
The resulting probability distribution is then used to randomly choose a prodigy to
replace the underdog (2: to 5:). Starting from a promising point in the search space,
4
The index z represents the zth-best position in this sorted order.
123
Algorithm 4 Replacement Phase (HHA).
1: Import {I
g
p
},
g
,
g
, , {a
g
<x>
}, , T
t
2: if Replacement by Clone (with p
clone
) (parallel p
g
) then
3: Compute selection probabilities according to Eq. (3)
4: Randomly choose a prodigy I
g
pi cked

g
based on p(I
g
<x>
)
5: I
g
<Pop(x1)>
= I
g
pi cked
6: else
7: Compute averaged weights according to Eq. (4)
8: Randomly choose K
max
assets i I
g
based on w
g
i
and normalize
9: Make I
g
i dol
a valid solution and evaluate F(I
g
i dol
)
10: if F(I
g
i dol
) [F(I
g
<Pop+(x1)>
) T
t
] then
11: I
g
<Pop(x1)>
= I
g
i dol
& update I
g
if necessary
12: end if
13: end if
during the Modication Phase, a Clone will ultimately develop a different portfolio
structure than its twin.
In contrast, an underdog is replaced by a so-called Averaged Idol (instead of by a
Clone) with probability 1 p
clone
(6:). An Averaged Idol is a solution that is created
based on the idols pool of successful asset combinations and portfolio weights. The
different indices of assets that are heldbythe groupof idols
g
are collectedina set I
g
.
For each of these assets i I
g
an averaged weight w
g
i
is computed as follows:
w
g
i
=
n={x:x=1,...,|i I
g
<x>
}
w
g
i,n
a
g
<n>
+ bw
g
i,
i I
g
n={x:x=1,...,|i I
g
<x>
}
w
g
i,n
a
g
<n>
+ bw
g
i,
_, i I
g
(4)
where
b =
_
1 if i I
g
0 else
In Eq. 4 an averaged weight of asset i I
g
is computed, rstly, by building the sum

(over all idols holding this asset) of the products of this assets weights and the idols
amplifying factors. Secondly, this sum is built for all assets i I
g
and summed up
over all #I
g
assets in order to, thirdly, normalize (7:). The number of different assets
held by the idols, #I
g
, varies over the generations and will usually be larger than K

max
.
Consequently, a decision of which assets to include in the Averaged Idol must be taken.
This selection is executed randomly with w
g
i
as the selection probabilities (8:). Hence,
those assets are more likely to be selected that appear in portfolios of idols more often
and/or with larger portfolio weights.
5
The Averaged Idol is made a valid solution and
gets evaluated (9:). It will replace the corresponding underdog, if its objective func-
tion value (the tness) is not worse by more than the current threshold value T
t
(10:
5
In order to avoid multiple selections of assets into one Averaged Idol, an already picked asset is excluded
from the list of (remaining) options. The free probability gets distributed over the remaining choices
according to their weights.
123
to 12:). Hence, the acceptance rule from the Modication Phase is applied again. The
idea of Averaged Idols is to exploit not only the experience of two parents, like it is
common in genetic algorithms, but those of a whole group of successful predecessors
as is dened with
g
.
After the Replacement Phase a newgeneration g = (t 1) i t er +l evolves, whose
individuals will again, rstly, independently develop in the Modication Phase, before
they, secondly, get ranked in the Evaluation Phase, before, thirdly, another Replace-
ment Phase follows and so forth. After having reduced the threshold and the step
size to their minimum values (T
t hresh
=0, U
t hresh
=U
mi n
) and after having computed
Pop+Popt hreshi t er st eps objective function values in t hreshi t er generations,
the algorithm terminates and reports the current elitist.
The threshold sequence {T
t
} has a strong inuence on the performance of the HHA,
since it determines the tolerance towards impairments. Hence, the numerical values
should consider the typical difference in the objective function value that is caused
by a search step, which is, in turn, dependent on the step size U
t
. Following Fang and
Winker (1997) a data driven choice of the threshold sequence is applied. This is done
by initially carrying out 1,000 search steps (for each step size) in the fashion of the
Modication Phase with the only difference being that all new solutions are accepted
without reservation. Next, an empirical distribution for each step size is obtained by
the absolute differences between consecutively computed objective function values.
Then, the threshold values are dened as quantiles of the empirical distributions that
result from different step sizes. In order to gradually decrease the tolerance towards
impairments and promote a greedy search, decreasing quantiles are taken and the least
value is set to T
t hresh
= 0.
2.2 Robustification: Ttnc and Koenig approach
Ttnc and Koenig (2004) (TK) attempt to capture the uncertainty regarding the
parameters and in their uncertainty sets S
T K
and S
T K
by carrying out the

following three steps. Firstly, the historical data sample is bootstrapped, e.g., 1,000
times, with a moving block bootstrap-procedure (MBB)
6
(Efron and Tibshirani 1993).
Secondly, the means
s
and covariance matrices

s
of these bootstrap samples are
computed. Thus, 1,001 point estimators are gained. Thirdly, based on the 1,001 mean
vectors, S
T K
is dened in such a way that it includes independently for each asset i a

choice of the middle (1 )100% of all
i,s
. In the same component wise manner, a
choice of the middle (1)100%of the 1,001 drawings available for each component
in

s
denes S
T K
.
With the objective of constructing a portfolio that exhibits good characteristics for
many possible scenarios of the point estimators, the portfolio is optimized under the
uncertainty sets worst-case-scenarios. Due to the no-short-selling constraint in (2),
the worst-case-expected return
T K
W
possible in S
T K
is given by the (/2)100%quan-

6
Although the TK-approach ignores certain dependencies between the assets (as will be explained), for
simplicity reasons, this MBB-procedure is used in all bootstrap applications due to its capability of capturing
possible (auto)correlations within the historical return data.
123
tile independently for each asset i . It is important to notice that this procedure ignores
all correlations among the K expected returns. This is problematic, since exactly these
correlations will often avoid a simultaneous occurrence of the worst-case-situation for
all assets. Even more problematic is the component wise construction of the worst-
case-covariance matrix

T K
W
. Because short-selling is forbidden, the worst-case for
each (co)variance is given by the largest value in S
T K
, i.e. by the (1 /2)100%

quantile of the corresponding positions entries in the 1,001 covariance matrices

s
.
Due to the picking of single components from different covariance matrices, there is
no assurance that

T K
W
is positive definite.
The constructed parameters
T K
W
and

T K
W
are then used as the inputs for
problem (2), which is optimized with the HHA. The outcome is a stochastic approxi-
mation of the global maximum represented by the elitists (robust) portfolio I
T K
, i.e.
the best (robust) portfolio found.
2.3 Robustification: Ceria and Stubbs approach
Ceria and Stubbs (2006) (CS) only consider the uncertainties regarding the expected
returns and neglect the uncertainties regarding the covariance matrix. Their reasoning
for this procedure is the nding from Chopra and Ziemba (1993) that cash-equivalent
losses due to errors in estimates of expected returns are an order of magnitude greater
than those for errors in estimates of variances or covariances. S
CS
is constructed as a
K-dimensional ellipsoid that denes a region which envelopes the joint deviation of
the estimator from its true value with a given condence level 1 :
( )
1
( )
2
(1),K
(5)
In expression (5), represents the (K K)-covariance matrix of the expected returns
and
2
(1),K
the inverse cumulative distribution function value of a Chi-squared distri-
bution with K degrees of freedomand level of significance . The worst-case-scenario
in the CS-approach is dened by the maximum joint deviation, i.e. by the maximum
deviation of the true return from its estimator that theoretically can occur within ellip-
soid (5). Thus, it is given by solving the Lagrangian:

CS
W
= arg max

L(, )
= arg max

_
w

2
_
( )
1
( )
2
(1),K
_
_
=
_
2
(1),K
w
w
w (6)
Equation 6 shows that the (K 1)-mean vector is component wise penalized in
such a way that the larger an assets portfolio weight w is, the greater the assets
penalty becomes. Due to the penaltys positive dependence on w it (partly) com-
123
pensates the error-maximizing characteristic of MVO-portfolios. Moreover, consid-
ering the expected returns correlations through when constructing the worst-case-
scenario is reasonable.
7
Of course, is not known and has to be estimated with the
given data sample. For a few types of return estimators , Stubbs and Vance (2005)
give suggestions. In the case of stationary returns and historical means for one can
use

CS
=T
1

.
8
Assigning a probability 1 with which the ellipsoid envelopes
the true expected return-vector is only valid if the return distribution is elliptical (e.g.
Fang et al. 1990), so that
2
(1),K
actually has its attributed meaning.
Due to the constraint that limits the number of different assets held to K
max
, Eq. 6
has to be adjusted to the dimensions of the considered individual I
p
, p = 1, . . . , Pop.
As before, this is denoted with index p:

CS
p,W
=
0, p

2
(1),K
p
w
CS
p
w
p
CS
p
w
p
(7)
During optimization of problem (2) with the HHA, the penalized return (7) is applied
for each of the Pop individuals whenever their tness is computed. The elitist I
CS
denes the robustly optimized portfolio of the CS-approach.

2.4 Robustification: an extension of the CS-approach
We enrich the robust optimization techniques with an approach that upgrades the tech-
nique of Ceria and Stubbs (2006) by an uncertainty set for the covariance matrix . Our
extended model is denoted as ECS-approach. Furthermore, while S
ECS
is also con-
structed with an ellipsoid similar to (5), its components are computed in a rened way.
Firstly, as is shown in Algorithm5, the estimator for the matrix is generated using the
MBB-technique as follows: the historical return sample is bootstrapped 5,000 times
to allow for the computation of an identical number of -vectors (1: to 3:). The gained
sequences are then used to calculate

ECS
(4:) out of which only those components
are written into

ECS
p
that correspond to the assets held by individual I
p
(5:).
Algorithm 5 ECS-Covariance Matrix for Expected Returns.
1: Generate 5,000 return samples s with the MBB-technique
2: Compute the expected return vectors
s
, s = 1, . . . , 5,000
3: Build K expected return sequences {
1,s
}, . . . , {
K,s
}, each of length 5,000
4: Estimate a covariance matrix

ECS
from these sequences
5:

ECS
p
consists of only those components that correspond to the assets held by I
p
7
Assets with larger (co)variances will c.p. be penalized stronger and vice versa. Therefore, unlike in the
TK-approach, an independent simultaneous occurrence of the worst-case-situation for all assets will be
prevented.
8
T denotes the number of return observations used to estimate

.
123
Secondly, as is shown in Algorithm 6, the ellipsoids size is also determined using
the MBB-technique. To this end the historical return sample (with mean-vector
0
)
is bootstrapped 10,000 times to allow for the computation of an identical number of
mean-vectors
s
, s = 1, . . . , 10, 000 (1: to 2:).
9
Out of each (K 1)-vector
s
, K
max
assets are randomly picked and written into
s,K
max
(4:). These
s,K
max
can be inter-
preted as the true expected return vectors and must be inserted together with their
corresponding estimators in
0
and in

ECS
into an ellipsoid such as (5) to obtain
10,000 joint deviations
s
(5: to 6:). The (1 )100% quantile of the generated dis-
tribution function f () is then used to replace
2
(1),K
p
. Thus, the worst-case-return
is given by:

ECS
p,W
=
0, p

_
f
1
()
w
ECS
p
w
p
ECS
p
w
p
(8)
The usage of f
1
() rather than
2
(1),K
p
has the advantage of not having to make
assumptions regarding the asset return distribution. However, randomly choosing
K
max
out of K assets as well as the application of the MBB-technique for gener-
ating both

ECS
and
s
lead to stochastic quantiles f
1
().
10
Algorithm 6 Size of the ECS-Ellipsoid for Expected Returns.
2: Compute mean vectors
s
, s = 1, . . . , 10,000
3: for s = 1 to 10,000 do
4: Choose randomly K
max
assets from the (K 1)-vector
s
that dene
s,K
max
5: Insert
s,K
max
with the corresponding components in

ECS
and
0
into ellipsoid (5)
6: Obtain the joint deviation
s
7: end for
8: Determine the (1 )100% quantile f
1
()
The crucial extension in the ECS-approach is the construction of an uncertainty
set for . Analogous to the returns, the portfolio risks worst-case is dened by the
maximum joint deviation of the portfolios true variance w
w from its estimator

w

w that theoretically can occur within an ellipsoid. By exploiting the symmetry of
and by using the definitions
p
= vec(
p
), W
p
= 2(w
p
w
p
) dg(w
p
w
p
) as well
as
p
= vec(W
p
),
11
alike in (6), this ellipsoid is considered in a Lagrangian. Hence,
9
Algorithm 6 generates a larger number of bootstrap samples compared to Algorithm 5, due to the latter
algorithms memory requirements of the procedures in line (4:).
10
Empirical tests show that the variation in the distribution functions and their quantiles f
1
() is neg-
ligible when they are based on 10,000 bootstrapped values
s
.
11
Here, the vec()-operator is columnwise stacking the components within the lower triangle of a matrix
to a vector. A (K K)-matrix A results in the ([K
2
+ K]/2 1)-vector vec( A). The vec
1
-operator
reverses this operation and restores the original matrix A. The dg() operator sets all elements off the main
diagonal to a value of zero.
123
the portfolios worst-case covariance matrix is given by
12
:
ECS
p,W
= vec
1
_
arg max
p

_
p

p
2
_
(
p

p
)
1
p
(
p

p
)
_
__
= vec
1
_
arg max
p

L(
p
, )
_
=

0, p
+ vec
1
__

p
_
(9)
In Eq. 9,
p
represents the covariance matrix of the components in vector
p
, i.e., the
covariance matrix of the [(K
p
)
2
+ K
p
]/2 asset returns (co)variances. The estimator
is generated by applying the MBB-technique analogously to generating

ECS
(see
Algorithm 5). Also, the ellipsoids size has to be determined via MBB, since the
distribution of
p
is unknown. To this end, as depicted in Algorithm 7, the historical
return sample (which yielded the stacked covariance matrix
0
) is bootstrapped 10,000
times and the same number of stacked covariance matrices
s
, s = 1, . . . , 10,000, is
computed (1: to 2:).
Algorithm 7 Size of the ECS-Ellipsoid for Covariance Matrices.
2: Compute the stacked covariance matrices
s
, s = 1, . . . , 10,000
3: for s = 1 to 10,000 do
4: Dene
s,K
max
by only those components from
s
that correspond to K
max
randomly chosen assets
5: Insert
s,K
max
with the corresponding components in
0
and in

into an ellipsoid as appears in (9)
6: Obtain the joint deviation
s
7: end for
8: Determine the (1 )100% quantile f
1
()
Out of each
s
-vector those components are written into
s,K
max
that correspond
to K
max
randomly chosen assets (4:). Together with their corresponding components
in
0
as well as in

these 10,000
s,K
max
are inserted into the ellipsoid to compute
10,000 joint deviations
s
(5: to 6:). The (1 )100% quantile of the generated dis-
tribution function f () is then used as the ellipsoids size, i.e., = f
1
() (8:).
13
Unlike in Eqs. 7 and 8, it cannot be assumed that the weighted sum of the components
in a covariance matrix row is positive.
14
In order to ensure

ECS
p,W
to be a penalized
12
The expression already considers the dimensionality of an individual p.
13
Empirical tests indicated that the number of bootstrap samples was sufcient to ensure a small enough
variation.
14
In

EC
p
and

ECS
p
the variances can be assumed to be greater than the (absolute) covariances, since in
the empirical application daily stock returns were used.
123
covariance matrix, only the main diagonal components of

p
, i.e., the variances of
the returns (co)variances, are used. Thus, the penalized covariance matrix is given
by
15
:
ECS
p,W
=

0, p
+ vec
1
_
_
f
1
()
p
dg(

p
)
p
_
(10)
The interpretation of Eq. 10 is as follows: the greater a (co)variances variance, the
more the historically estimated (co)variance will be penalized. Also, due to a positive
dependence of the penalization on the weights, the error-maximization is (partly)
compensated. Both the penalized return (8) and the covariance matrix (10) are applied
for each of the Pop individuals whenever their tness is computed. The elitist I
ECS
denes the robustly optimized portfolio in the ECS-approach.

3 Computational results
3.1 Empirical analysis
For the empirical analysis, rstly, several parameter values in problem (2) have to be
set. Therefore, an investor is assumed whose utility function corresponds to = 0.6.
She has an endowment of V = 1,000,000 Euro to be invested into a maximum of
K
max
=7 stocks that are constituents of the German index DAX100 on March 16th,
2006.
16
Stocks can be purchased for their previous closing price with costs c
f
= 10
Euro and c
v
= 0.0025. The parameters
0
and

0
are based on 250 daily log-returns.
The results are transformed into monthly values, since the investor is assumed to rebal-
ance the portfolio once a month. The value determines the most extreme parameter
values that are still included in the uncertainty sets. The smaller is, the greater an
uncertainty set will be, and thus the greater the worst-case estimation errors will be.
Hence, can be interpreted as a parameter that captures the investors tolerance for
estimation errors. We assume the investors utility function to correspond to = 0.05.
Secondly, the HHA has to be parameterized. The minimum step size is set
to U
mi n
= 0.0004 corresponding to 400 Euro. The maximum step size is
set to U
max
= 0.3 to ensure the trial of enough asset combinations for the
case K
max
= 7. Through extensive empirical testing the following parameter-
izations have been found to result in a sufciently reliable stochastic outcome:
Pop =40, t hresh =30, i t er =5, st eps =50, =15, = 10, p
c
= 0.7, and
p
r
= 1.
17
Thus, 300,000 objective function values are computed per optimization.
18
15
Due to using only the main diagonal components the second addend is a matrix consisting of only
positive components, which is why

ECS
p,W
can be guaranteed to be positive semidefinite.
16
Two rms were removed from the sample due to missing data.
17
Tens of thousands of portfolios were optimized testing a large spectrum of possible parameter settings
in order to nd an appropriate parameterization (results not reported, but available on request).
18
Section 3.2 provides a convergence analysis for the HHA that supports the use of 300,000 objective
function evaluations per optimization.
123
Table 1 Out-of-sample performance
wi n Objective function value Portfolio return Portfolio risk
MVO TK CS ECS MVO TK CS ECS MVO TK CS ECS
1 2.77 2.94 2.89 3.61 0.10 3.72 0.49 5.23 4.68 2.41 4.50 2.53
2 9.63 5.78 9.86 8.26 6.91 7.32 8.35 8.12 11.45 4.76 10.87 8.35
3 11.56 6.09 9.27 7.72 3.53 3.73 4.07 4.58 16.92 7.66 12.74 9.81
4 3.81 2.41 2.30 1.89 3.67 1.39 0.39 0.41 3.90 3.08 3.57 3.42
5 0.56 0.31 0.36 0.24 4.33 3.60 4.80 4.19 3.82 2.92 3.80 3.19
6 1.21 1.07 0.84 0.84 0.35 0.07 0.66 0.76 1.79 1.73 1.84 1.91
7 0.69 0.76 0.69 0.19 1.57 0.98 1.65 4.12 2.20 1.92 2.25 2.43
8 0.66 0.51 0.62 0.18 1.51 1.67 1.61 3.25 2.11 1.96 2.12 2.46
9 0.16 0.05 0.04 0.12 3.53 2.76 3.37 3.19 2.09 1.92 2.27 2.30
10 1.73 0.26 1.71 0.28 1.60 2.95 1.72 4.91 3.96 2.40 3.99 2.80
11 1.80 1.62 1.74 1.64 0.10 0.24 0.22 0.21 2.93 2.55 2.66 2.58
12 3.49 2.62 3.53 3.70 2.65 2.21 2.96 2.95 4.05 2.89 3.68 3.91
13 0.04 0.12 0.25 0.21 4.73 4.26 4.67 4.94 3.09 3.04 2.70 2.95
14 2.49 2.76 3.00 2.37 2.62 3.11 2.60 3.87 2.40 6.68 6.74 6.54
15 1.80 1.57 1.79 1.68 0.44 0.22 0.54 1.12 2.71 2.46 2.62 3.55
16 1.55 1.15 1.45 0.77 2.33 0.05 0.43 2.71 4.14 1.88 2.70 3.09
17 7.60 6.18 6.36 6.33 6.53 8.25 6.70 5.15 8.31 4.81 6.13 7.11
18 3.53 1.93 1.57 1.14 0.86 0.33 1.75 3.33 6.45 3.43 3.79 4.12
19 1.17 1.01 1.11 1.05 6.52 5.30 5.96 6.39 2.40 1.85 2.13 2.51
20 2.42 3.38 3.20 2.87 2.54 3.74 1.70 0.24 5.72 3.15 4.20 4.62
21 9.13 4.27 7.62 7.13 11.81 2.94 9.67 9.11 7.34 5.15 6.26 5.81
22 3.67 2.92 3.29 3.29 2.94 2.73 3.18 3.78 4.15 3.05 3.36 2.96
Mean 3.12 2.17 2.75 2.35 0.54 0.53 0.41 0.18 4.85 3.26 4.31 4.04
SD 3.39 2.02 2.97 2.76 4.30 3.58 4.07 4.51 3.59 1.60 2.80 2.14
Table 1 summarizes for all approaches the actual objective function value, portfolio return, and portfolio
risk, measured as the return volatility. For each column the mean and the standard deviation are shown
Thirdly, to evaluate a portfolios performance with respect to its robustness, a mov-
ing time window procedure is implemented. After the portfolio is optimized and held
for 21 subsequent out-of-sample trading days, the window of 250 trading days is
moved forward by 21days. Then a new optimization is run and the resulting portfolio
is again held for 21 out-of-sample trading days before the window is moved again and
so forth.
19
The sample spans from March 17th, 2005 to January 21st, 2008, allowing
for the construction of 23 portfolios.
20
Table 1 shows the out-of-sample performance of the MVO and all robust
approaches, in which the former serves as a benchmark. It can be seen that the mean
19
Whenever a window is moved so that the samples observations are updated, new threshold sequences,
parameters, and distributions (when applicable) must be computed for each approach.
20
All used return series were shown to be stationary according to the ADF-test as well as the KPSS-test.
123
risks are lower and the mean returns are higher for all three robust approaches com-
pared to the MVO-approach. In addition, smaller variations around these better mean
values can be observed. Among the robust approaches, on average TK- and ECS-
portfolios exhibit the lowest risks, and the ECS-portfolios clearly exhibit the highest
returns. The fact that also the highest return is on a rather low absolute level is mainly
caused by using historical means as estimators and by the conservative choice of the
transaction costs.
21
However, since our aim is to examine portfolios regarding their
robustness-properties, the absolute level of returns might be of minor importance. The
robust approaches advantages become most apparent when investigating actual bad-
case-scenarios, e.g., periods 2, 3, and 21. While, e.g., in period 3, the TK-approach
achieves a return that is only slightly lower than that of the MVO-approach, its risk is
less than half as high as that of the MVO-approach. Similarly, in period 21, all robust
approaches exhibit a higher return compared to the MVO-approach plus a lower risk.
In most good-case-scenarios, e.g., period 19, the robust approaches exhibit only a
slightly lower, in some periods (e.g. 8 and 10) even higher returns. Especially the
ECS-approach performs well also in normal periods, which is why it exhibits in
12 (3) of 22 periods the highest (lowest) return. Consequently, it achieves the only
positive mean return, which makes the approach especially interesting.
To gain further meaningful insight, Table 2 exhibits the portfolio compositions as
well as the forecast errors. There, TraS (traded stocks) shows the traded volume
measured in pieces of stocks and ToR (turnover ratio) is the priced traded volume
relative to the twofold capital endowment, both in period wi n compared to wi n 1;
the number of identical stocks in these subsequent periods is given by I dS (K
p
is
shown in parenthesis) and RoF is their averaged range of uctuation. It is obvious
that the MVO-approach exhibits a remarkable uctuation in the portfolio composition.
On average 14,777 pieces of stocks are traded to rebalance the portfolio in each period.
It is, however, noticeable that in periods wi n [7; 12], the average turnover ratio is
only about half as high as over the total period and TraS exhibits an average value
of only 10,465 in that period for which reason this might be considered a more stable
period.
The last three columns of Table 2 show the differences between actually realized
and predicted objective function values (multiplied by 100), portfolio returns, and
portfolio volatilities. An examination of the portfolio returns forecast errors shows
that the actual (out-of-samples) performance is worse than its expectation in 18 out
of 22 periods. The return was on average overestimated by 4.81% points, as is shown
by the mean forecast error. A mean overestimation of 6.26% points (as shown by
the mean negative forecast error) as well as the root mean negative squared forecast
error of 4.95 provide more detail about the extent of the return overestimation. The
corresponding results for the portfolio risk are, as expected, better. Nevertheless, the
mean forecast error also points in the unfavorable direction, i.e., the portfolio risk is on
average underestimated by almost one percentage point. Due to = 0.6 the objective
functions mean forecast error is between that of the return and the risk. However, since
21
The construction of portfolios based on estimators with such a limited forecast quality apparently contrib-
utes only little to good out of sample properties. Instead, this procedure solely contributes to a construction
of portfolios that exhibit good in-sample risk-return characteristics.
123
Table 2 MVO-approach
wi n Fluctuation of composition Portfolio weights of stock no. Forecast errors
TraS ToR IdS RoF 7 30 34 48 95 e
F
e
1 0.30 8.11 0.13

2 33,393 0.29 5(7) 0.89 7.47 16.29 6.12
3 20,701 0.28 6(7) 2.19 27.29 3.23 10.67 11.67
4 49,481 0.52 4(7) 13.66 58.00 8.52 2.69 7.73 0.45
5 22,144 0.26 4(7) 1.68 52.73 8.77 4.77 5.42 1.14 0.12
6 15,232 0.18 5(7) 1.29 54.19 4.93 7.42 0.54 4.20 2.26
7 16,438 0.14 5(7) 1.81 50.00 8.45 6.74 7.71 2.50 1.71 1.55
8 17,559 0.27 4(7) 1.05 49.02 9.03 7.32 9.75 2.14 1.96 1.25
9 6,062 0.05 7(7) 1.61 48.81 12.98 8.96 7.27 4.15 0.12 1.21
10 5,326 0.04 6(7) 0.72 47.66 13.89 10.12 7.24 2.20 1.80 0.69
11 12,888 0.18 4(7) 2.44 47.74 20.88 12.04 8.03 0.60 3.42 0.45
12 4,517 0.07 6(7) 1.42 48.49 17.84 12.28 7.56 1.75 5.24 0.82
13 13,649 0.22 4(7) 5.03 55.48 13.02 6.62 4.31 5.84 3.11 0.17
14 10,732 0.12 5(7) 0.81 56.79 12.84 4.50 1.69 4.76 0.57
15 9,746 0.24 5(7) 3.76 42.97 12.74 7.93 0.37 2.74 0.17
16 18,601 0.58 4(7) 5.04 15.02 2.97 1.59 0.46
17 14,300 0.36 4(7) 2.27 17.89 6.27 12.03 4.21
18 9,735 0.23 6(7) 4.98 34.15 1.58 4.17 1.91
19 15,769 0.37 4(7) 2.56 35.08 7.27 2.60 1.45
20 4,057 0.10 5(7) 1.30 38.55 3.14 2.22 1.56
21 10,859 0.28 5(7) 3.19 39.18 11.07 17.58 2.26
22 9,567 0.28 5(7) 3.22 32.46 1.78 6.56 0.20
23 4,335 0.17 6(7) 1.73 32.81
(a)
14,777 0.34 4.95 2.85
(b)
30.70 12.44 7.51 2.51 34.87
(f)
2.47 4.81 0.91
(c)
7.83 3.99 2.82 0.88 14.00
(g)
3.47 6.26
(h)
2.99
(d)
33
(e)
24
(i)
3.53 4.95
(j)
3.57
Table 2 summarizes the results of the MVO-approach. The values within the framed line (a) show mean
values; lines (b) and (c) show the full range of uctuation and the standard deviation (each in percentage
points) of the assets portfolio weights. All displayed stocks were held for at least six subsequent periods.
The number of stocks held for a minimum of two and three subsequent periods is given by values (d) and
(e); lines (f), (g) and (i) show mean forecast errors, mean negative [in (h) positive] forecast errors, and root
mean negative [in (j) positive] squared forecast errors
both underestimating risk and overestimating return increases the objective functions
forecast error, on average no error compensation takes place. The average actual return
(risk) of 0.54 (4.85)% is by the magnitude of the displayed errors lower (higher)
than its expected average value of 4.27 (3.94)%.
The results of the TK-approach are shown in Table 3. Although the return pre-
dictions improved compared to the MVO-approach, there is still on average, despite
optimizing for the theoretical worst-case-scenario, an overestimation of 0.45% points
present. In contrast, the risks mean forecast error of 0.40% points shows that on
123
Table 3 TK-approach
F
e
1 30.07 1.36 5.65 1.48

2 12,0284 0.22 5(7) 1.37 29.54 4.12 8.79 1.01
3 19,367 0.44 4(7) 2.83 19.36 24.28 3.98 3.89 4.04
4 23,149 0.52 4(7) 16.63 54.95 8.84 6.40 7.04 0.18 1.54 1.33
5 10,821 0.13 6(7) 2.01 48.63 8.73 8.34 9.49 2.41 4.08 1.29
6 2,340 0.04 7(7) 1.52 48.26 8.86 8.52 10.14 1.54 0.07 2.52
7 3,663 0.07 6(6) 1.28 49.77 8.51 11.39 10.32 1.76 0.90 2.34
8 10,468 0.13 5(7) 1.23 49.84 7.12 12.25 6.94 1.73 1.28 2.04
9 1,768 0.02 7(7) 0.87 49.52 5.40 15.26 6.41 2.31 2.74 2.02
10 3,428 0.07 6(7) 0.64 49.19 5.24 16.85 2.07 2.90 1.51
11 8,714 0.11 5(7) 0.58 48.62 5.48 18.86 0.64 0.16 1.17
12 1,825 0.03 7(7) 1.26 50.48 7.17 16.40 0.31 1.67 0.60
13 8,058 0.16 6(6) 4.07 64.77 5.60 10.91 2.24 5.28 0.21
14 6,162 0.12 5(6) 2.67 69.57 2.01 8.79 0.65 3.55 3.44
15 17,490 0.33 3(6) 5.10 55.99 4.38 7.66 0.45 0.21 0.89
16 7,116 0.15 5(7) 1.94 52.22 6.80 0.59 0.18 0.87
17 3,647 0.08 5(7) 1.71 51.52 8.29 4.69 8.62 2.07
18 11,579 0.37 4(6) 4.32 48.62 15.71 0.29 0.93 0.13
19 2,107 0.09 6(6) 3.47 38.42 15.56 3.18 5.47 1.65
20 576 0.02 6(6) 1.26 39.07 15.19 1.22 3.74 0.46
21 3,313 0.11 6(7) 2.95 37.82 12.93 1.75 1.88 1.66
22 7,877 0.29 6(7) 5.05 35.27 13.88 0.00 1.09 0.73
23 3,362 0.12 6(7) 1.28 35.44 14.62
(a)
7,675 0.17 5.45 2.91
(b)
50.20 11.33 6.85 12.46 23.66
(f)
0.06 0.45 0.40
(c)
10.76 4.28 2.13 4.13 10.02
(g)
2.26 3.38
(h)
2.06
(d)
19
(e)
18
(i)
1.73 3.10
(j)
1.47
Table 3 summarizes the results of the TK-approach for = 0.05 with the usual key data (explained for
Table 2). The weights of those stocks are listed that were held for at least eight subsequent periods
average the risk is no longer underestimated. The surprising observation that the ex-
pected risk of the MVO-portfolios in some periods is greater than that of the TK-port-
folios (see Fig. 1) can be explained as follows: compared to the expected (worst-case)
returns that are mostly close to zero, the expected (co)variances are high. In addition,
by weighting the risk heavier through = 0.6, the objective function value is greatly
determined by the portfolio risk. Thus, an implicit movement towards the minimum-
variance-portfolio (MVP) takes place. As can be seen in periods 7, 13, 14, 18, 19, and
20, the portfolio is only diversied among six stocks, which must be seen critically and
is originally caused by many negative returns in the worst-case-scenario. However, the
aforementioned mechanism contributes to portfolio diversication so that not solely
123
5 10 15 20
-0.1
-0.05
0
5 10 15 20
-0.1
-0.05
0
0.05
5 10 15 20
0.05
0.1
0.15
5 10 15 20
-0.1
-0.05
0
5 10 15 20
-0.1
-0.05
0
0.05
5 10 15 20
0.05
0.1
0.15
5 10 15 20
-0.1
-0.05
0
5 10 15 20
-0.1
-0.05
0
0.05
5 10 15 20
0.05
0.1
0.15
Objective Function Value Portfolio Return Portfolio Volatility
TK
CS
ECS
Fig. 1 Robust optimization approaches. All nine graphs in Fig. 1 map the time window wi n on the
x-coordinate, while the y-coordinate maps the objective function value (left column of graphs), the portfo-
lio return (middle column of graphs), and the portfolio volatility (right column of graphs). The series with
markers showactually realized (out-of-sample) values, while the non-marked series showthe corresponding
expected values. In all graphs the dashed lines represent the MVO-approach
stocks with positive returns are picked due to the implicit risk-overweighting.
22
Yet
whenever the returns norm is too large, this mechanism fails such as in periods 14,
19, and 20.
The movement toward the MVP also explains the differences in the portfolio com-
positions between the MVO- and the TK-approach.
23
Indicated by all measures, the
TK-portfolios compositions are more stable; e.g., the turnover rate is about half as
high and only half as many stocks are traded as in the MVO-approach. With an aver-
age I dS of 5.45, on average less stocks are exchanged per period. In three periods
no stocks get exchanged in the TK-portfolio (versus only one in the MVO-approach).
Stocks are generally held for longer periods, which is most apparent for stock no. 7
but it can be seen also by the amount of stocks that are held for at least two (three)
periods [19 (18) versus 33 (24) in the MVO-approach].
24
22
On average, four stocks in the portfolio exhibit negative expected returns.
23
Setting = 0.8 only in the MVO-approach creates a portfolio composition that is more similar to that
of the TK-approach.
24
It is to be noticed here, that these 19 (18) stocks are held longer and that this relatively small number is
not the result of more stocks being held for only one period.
123
Table 4 CS-approach
F
e
1 2.81 7.29 0.17

2 36,754 0.42 4(7) 0.81 9.89 15.57 6.11
3 38,638 0.47 5(7) 2.64 27.01 19.36 8.55 8.43 8.64
4 35,783 0.45 4(7) 8.88 50.13 7.38 12.89 0.91 2.62 0.23
5 9,405 0.07 6(7) 0.87 49.36 8.00 4.35 15.80 1.08 2.58 0.09
6 4,729 0.10 6(7) 0.49 49.77 7.43 4.47 16.69 8.12 0.45 2.04 2.12
7 6,145 0.06 6(7) 1.36 48.72 8.96 5.92 13.76 8.27 0.60 0.71 1.48
8 13,975 0.18 5(7) 1.60 47.70 9.57 6.61 9.51 9.73 0.34 0.95 1.21
9 6,199 0.05 7(7) 1.69 47.43 14.04 8.09 6.89 9.50 0.80 0.51 0.99
10 7,267 0.10 5(7) 0.64 46.55 14.66 9.17 9.53 0.76 0.78 0.75
11 14,762 0.17 4(7) 1.24 46.39 17.93 10.03 10.19 0.71 2.56 0.52
12 5,029 0.07 6(7) 1.55 47.24 14.45 10.26 9.62 1.97 3.89 0.68
13 14,095 0.19 5(7) 4.15 55.07 10.78 5.46 1.61 4.11 0.05
14 6,287 0.09 6(7) 1.40 54.69 9.50 2.52 1.82 1.62 4.12
15 13,866 0.20 6(7) 3.15 48.73 8.94 0.69 1.82 0.06
16 10,970 0.24 5(7) 4.27 36.21 0.48 0.90 0.20
17 11,541 0.34 4(7) 2.47 30.79 5.65 9.31 3.20
18 10,171 0.28 5(7) 4.42 25.28 0.44 0.06 0.78
19 7,048 0.15 6(7) 3.32 15.54 2.17 3.87 1.04
20 643 0.02 7(7) 1.17 15.52 2.21 4.12 0.94
21 16,213 0.38 4(7) 2.75 14.80 6.38 12.85 2.07
22 8,396 0.23 5(7) 1.61 16.51 1.72 4.79 0.33
23 4,079 0.15 6(7) 1.27 17.60
(a)
12,818 0.20 5.32 2.35
(b)
40.27 10.55 7.73 12.47 2.07
(f)
1.72 2.99 0.88
(c)
14.81 3.44 2.62 4.28 0.78
(g)
3.00 4.91
(h)
2.50
(d)
27
(e)
22
(i)
3.10 4.55
(j)
2.77
Table 4 summarizes the results of the CS-approach for = 0.05 with the usual key data (explained for
Table 2). The weights of those stocks are listed that were held for at least eight subsequent periods
The CS-approach (see Table 4) exhibits an improvement of the mean forecast error
compared to the MVO-approach. On average the actual portfolio returns are still 2.99%
points lower than predicted. But considering a less conservative penalization of the
portfolio returns, this is not surprising. Even though the CS-approach does not penalize
the covariance matrix, the portfolio risks forecast error measures are slightly better
than the benchmark. Nevertheless, the necessity of an uncertainty set for the covari-
ance matrix is apparent, considering both the mean forecast error and the high actual
risk compared to the other approaches (see Table 1 for the latter).
As can be seen by the stocks held, the corresponding periods, and their weights
standard deviations, the portfolio is similar to that of the MVO-approach, even if it
123
Table 5 ECS-approach
F
e
1 10.55 2.33 8.42 1.72

2 36,337 0.51 3(7) 2.48 6.96 12.49 3.27
3 31,843 0.52 4(7) 3.45 18.56 14.76 17.65 5.90 7.03 5.15
4 25,299 0.38 4(7) 4.55 23.33 16.44 12.87 11.62 0.51 0.20 0.99
5 14,402 0.15 6(7) 1.02 22.53 16.07 13.79 12.96 2.25 3.85 1.18
6 13,081 0.15 6(7) 1.11 23.35 14.55 14.03 11.24 1.56 0.09 2.66
7 523 0.01 7(7) 0.59 23.97 14.27 15.09 9.68 2.50 3.18 2.06
8 9,350 0.16 6(7) 1.93 25.33 11.46 16.37 1.85 1.76 1.91
9 10,240 0.10 6(7) 0.77 24.18 12.42 18.28 1.93 2.06 1.85
10 17,979 0.14 6(7) 1.22 25.03 13.52 19.50 2.27 3.28 1.60
11 6,960 0.11 6(7) 1.69 24.24 15.31 20.85 4.22 0.28 1.81 1.67
12 3,708 0.08 6(7) 1.53 27.37 16.19 18.40 1.52 4.18 0.25
13 20,046 0.27 5(7) 3.42 32.22 17.57 12.74 9.54 2.33 4.28 1.03
14 1,863 0.04 7(7) 1.09 32.11 18.16 11.53 8.04 0.41 2.80 2.56
15 20,497 0.38 5(7) 3.68 28.26 13.00 15.65 13.57 0.04 0.81 0.61
16 17,965 0.35 6(7) 7.97 6.97 31.77 17.40 0.71 0.02 1.17
17 5,946 0.17 6(7) 1.84 5.77 32.58 18.85 5.16 8.92 2.65
18 12,402 0.34 5(7) 6.29 3.84 16.81 31.52 0.62 0.32 0.83
19 10,134 0.25 6(7) 3.77 4.36 7.72 28.83 2.74 3.48 2.25
20 1,548 0.05 7(7) 1.03 4.55 7.50 28.17 1.18 3.17 0.14
21 8,091 0.20 6(7) 2.83 4.56 5.89 29.42 5.24 11.85 0.83
22 28,889 0.54 3(7) 3.07 12.08 6.78 23.56 0.94 4.76 1.61
23 15,276 0.38 4(7) 2.04 11.97 21.60
(a)
14,199 0.24 5.45 2.61
(b)
28.38 6.70 9.32 28.36 23.48
(f)
0.46 1.76 0.41
(c)
10.09 2.11 3.06 8.54 8.21
(g)
3.29 5.31
(h)
2.89
(d)
20
(e)
16
(i)
2.50 4.39
(j)
1.55
Table 5 summarizes the results of the ECS-approach for = 0.05 with the usual key data (explained for
Table 2). The weights of those stocks are listed that were held for at least nine subsequent periods
is more stable. The turnover ratio is about 40%, TraS about 15%, and the range of
uctuation about half a percentage point lower.
Table 5 shows the ECS-portfolios, which, with a value of only 0.41, produced
about the same mean forecast error for the risk as the TK-approach. The portfolio
return is predicted more accurately than in all other approaches except in comparison
to the TK-approach. This is especially interesting, since the improvement also results
from a better actual return performance as can be gathered from Table 1.
Even though most portfolio compositions key data seem to indicate only minor
improvements in the stability of the ECS-portfolios compared to MVO-portfolios,
this observation, with its impact on transaction costs, is primarily caused by relatively
123
many stocks held for only one period (not shown). Only 20 (16) stocks were held for
a minimum of two (three) subsequent periods which is the most stable result of all
approaches. Furthermore, similar to the TK-approach, in three periods (7, 14, and 20)
no stocks get exchanged and holding periods of the displayed stocks are relatively long.
Except for period 22, stock no. 7 is held in the same periods, but with less extreme
weights and a lower standard deviation as in the TK-approach. The ECS-approach
exhibits sequences of held stocks (not shown), which did not (noteworthy) appear in
the other approaches.
25
This selection is presumably caused by its small (co)variances
variances, which is not (satisfactorily) considered in the other approaches. Therefore,
e.g., stock no. 13 is penalized less heavily and is consequently more likely to be picked
than stocks that exhibit larger historical (co)variances.
26
To sum up, the robustification techniques lead to an improvement compared to the
MVO-approach. This improvement is possible due to a reduced risk that is not neces-
sarily accompanied by the corresponding lower returns. Among the robust approaches
the TK- and the ECS-approach clearly outperform the CS-approach.
3.2 Convergence analysis for the HHA
The result of an optimization with the HHAis a stochastic approximation of the global
optimum. In order to demonstrate the reliability of the HHA we analyze the distribu-
tion of outcomes of repeatedly solving problem (2) for the setup described in Sect. 3.1
with the only difference being that we use all available observations in the data set. We
increase the number of function evaluations from 12,000 to 900,000 and conduct 30
restarts for each setup to gain empirical distributions of the objective function values.
The analysis was conducted with a usual desktop computer with a 2.83- GHz processor.
The convergence towards the global optimum can be gathered from both graphs in
Fig. 2. The empirical distributions do not only shift to the favorable direction when the
number of function evaluations increases, they also exhibit a smaller standard devia-
tion. The box plots make very clear that increasing the number of function evaluations
to more than 300,000 leads to only negligible improvements. Then, in fact, the worst
portfolio is not even a one thousandth percentage point worse than the best overall
found portfolio. Moreover, when computing 300,000 or more function evaluations,
the HHA selected the best overall found asset combination in all restarts. In contrast,
when computing only 12,000 function evaluations only the best portfolio has found
this optimal asset combination. This reliability of the portfolio composition and the
corresponding narrow distribution of the objective function values made us choose a
setup with 300,000 function evaluations in the empirical analysis. To triple the number
of function evaluations to a value of 900,000 seems not to be worth the increase in the
computation time from 28.4 to 83.8s.
25
Stock no. 13 (80) did not appear (more than twice) in the other approaches, while it was held for six
(eight) periods the ECS-approach.
26
This is not true to the same extend for stock no. 16, since it was also part of the MVO-, the TK-, and the
CS-portfolios for some short holding sequences (not shown).
123
-15.56 -15.54 -15.52 -15.5 -15.48
0.2
0.4
0.6
0.8
1
900,000 fct. evals
12,000 fct. evals
300,000 fct. evals
Ojective function value
3
10
12 30 60 90 150 300 450 600 900
Number of fct. evals (sec. computation time)
3
10
(1.8) (3.8) (6.5) (9.4) (14.6) (28.4) (42.1) (56.6) (83.8)
0.002
0.004
0.006
0.008
0
worse
(in %)
Fig. 2 The left graph shows three exemplary empirical cumulative distribution functions for 12,000,
300,000, and 900,000 function evaluations. The right graph shows box plots of transformed empirical
distribution functions for all numbers of function evaluations. For the box plots each function value is
transformed into the percentage by which it is worse than the best solution found
12 30 60 90 150 300 450 600 900
Number of fct. evals (sec. computation time)
5
10
(1.0) (1.9) (3.2) (4.8) (7.4) (14.0) (20.7) (27.3) (40.4)
0.005
0.010
0.015
0.025
0
worse
(in %)
-4.66 -4.65 -4.64
CPLEX optimal
900,000 fct. evals
12,000 fct. evals
300,000 fct. evals
Ojective function value
5
10
0
0.2
0.4
0.6
0.8
1
-4.61 -4.6
Fig. 3 The left graph shows three exemplary empirical cumulative distribution functions for 12,000,
300,000, and 900,000 function evaluations as well as the optimal solution obtained by CPLEX. The right
graph shows box plots of transformed empirical distribution functions for all numbers of function evalua-
tions. For the box plots each function value is transformed into the percentage by which it is worse than the
optimal solution obtained by CPLEX
Note that a problemlike (2) can be formulated as a small/mediumsize mixed integer
quadratic programming problem (MIQP), when using linearization techniques (see,
e.g., Konno and Wijayanayake 2001; Guastaroba et al. 2009). MIQP problems can
be solved to optimality with a general purpose solver (e.g. CPLEX). Nevertheless,
since the purpose of this section is to show the good performance of the HHA as well
as the relative computation times, we solve a simplied Minimum-Variance-Portfolio
(MVP) problem. The MVP problem, which takes into account the integer as well as
the cardinality constraint (where K
max
7) but neglects transaction costs, is also a
MIQP problem and therefore suitable for comparing the HHA and CPLEX. The solu-
tion obtained by CPLEX within approximately two seconds (again using all available
observations in the data set) exhibits an objective function value of 4.5956 10
5
.
This (globally) optimal portfolio is compared to the results obtained by the HHA.
123
Figure 3 clearly shows that the HHA delivers sufciently precise results for this prob-
lem instance if 150,000 or more objective function values are computed (in only
7.4s/run). In fact, already when only 60,000 function evaluations were computed (in
3.2s/run), the globally optimal composition was found 29 times. Small differences in
the asset weights only caused differences in the objective function value at the eighth
digit after the decimal. In this example with daily data such a difference corresponds
to a one millionth of a basis point.
In the light of estimation errors within the input parameters of all optimization tech-
niques, the possible benets of solving to optimality seem to be extremely limited,
which is why we prefer to use a heuristic (Gilli and Schumann 2010). Moreover, we
want to keep the models exible for the consideration of other complex real world
constraints.
4 Conclusion
In this work different robust optimization techniques are empirically tested within a
complex optimization problem that emulates a realistic investment environment. The
employed hybrid heuristic algorithm is well capable of tackling the complexity of the
resulting optimization problems. We nd that the explicit incorporation of the uncer-
tainty about the true (but unknown) parameters into the optimization process leads
overall to superior results over the MVO-approach. The portfolio compositions are
shown to be more stable and consequently lead to a reduction of the transaction costs.
On average the out-of-sample portfolio risk is lower and accompanied by smaller
deviations, but not necessarily lower returns. This is possible since robust portfolios
exhibit an improved performance in bad-case-scenarios without necessarily a worse
performance in good-case-scenarios.
Although for the used data sample it remains unclear which approach performs
best, since the TK-approach performed slightly better in terms of risk while the ECS-
approach was superior in terms of returns, we consider the ECS-approach to be favor-
ably. This is due to the formers shortcomings of not considering expected returns
correlations, a possible singular covariance matrix of returns, and a reduced diver-
sication. The CS-approach that only uses an uncertainty set for the expected returns
seems to mostly only adjust the MVO-portfolios weights rather than it constructs
a whole new composition. Therefore, the CS-approach exhibits the disadvantage of
limited effectiveness. Furthermore, the employed estimator for the expected returns
covariance matrix needs to rely on distributional assumptions. If this covariance matrix
is generated by the MBB-technique rather than by linear transformation, on average
around 8,000 out of 9,604 components are larger. This indicates that the distributional
assumptions are not met with the consequence of a penalization that is too small.
The ECS-approach seems to pool some desired characteristics: (1) it offers uncer-
tainty sets for both the expected return and the covariance matrix in an intuitive way.
The greater the components deviations are, the greater the uncertainty and with that
the penalization will be. The penalization (2) takes correlations into account and (3)
does not rely on distributional assumptions. The ECS-approach remains also applica-
ble when different (expected return) estimators are employed. This might become a
123
major interest if portfolios are based on, e.g., factor models as employed by Roko and
Gilli (2008).
The application of different (expected return) estimators is a possible direction for
future research. Beside factor models, also popular techniques from time series analy-
sis seemto be promising when incorporated in the robust portfolio optimization setup.
Obviously, future research should also take further data sets into account to generalize
the results found in this work.
Acknowledgements Valuable comments and suggestions from two anonymous referees and the editors
of this special issue are gratefully acknowledged.
References
Bertsimas D, Pachamanova D (2008) Robust multiperiod portfolio management in the presence of trans-
action costs. Comput Oper Res 35(1):317
Black F, Litterman R (1991) Asset allocation: combining investor views with market equilibrium. J Fixed
Income 1(2):718
Ceria S, Stubbs R (2006) Incorporating estimation errors into portfolio selection: robust portfolio construc-
tion. J Asset Manage 7(2):109127
Chang TJ, Meade N, Beasly J, Sharaiha Y (2000) Heuristics for cardinality constrained portfolio optimi-
sation. Comput Oper Res 27(13):12711302
Chopra V, Ziemba W (1993) The effect of errors in means, variances, and covariances on optimal portfolio
choice. J Portfolio Manage 19(2):611
DeMiguel V, Nogales F (2009) Portfolio selection with robust estimation. Oper Res 57(3):560577
Dueck G, Scheuer T (1990) Threshold accepting: a general purpose optimization algorithmappearing supe-
rior to simulated annealing. J Comput Phys 90(1):161175
Dueck G, Winker P (1992) New concepts and algorithms for portfolio choice. Appl Stochas Models Data
Anal 8(3):159178
Efron B, Tibshirani R (1993) An introduction to the bootstrap, no. 57 in monographs on statistics &applied
probability. Chapman & Hall, New York, NY
Fabozzi F, KolmP, Pachamanova D, Focardi S (2007) Robust portfolio optimization and management, The
Frank J Fabozzi Series. Wiley Finance, Hoboken, NJ
Fang K, Kotz S, Ng K (1990) Symmetric multivariate and related distributions, no. 36 in monographs on
statistics & applied probability. Chapman & Hall, London, NY
Fang KT, Winker P (1997) Application of threshold accepting to the evaluation of the discrepancy of a set
of points. SIAM J Numer Anal 34(5):20282042
Frost P, Savarino J (1988) For better performance: constrain portfolio weights. J Portfolio Manage
15(1):2934
Genton M, Ronchetti E (2008) Robust prediction of beta, computational methods in nancial engineering.
Springer, Berlin 147161
Gilli M, Kllezi E (2002) Portfolio optimization with VaR and expected shortfall. In: Kontoghiorghes E,
Rustem B, Siokos S (eds) Computational methods in decision-making, economics and nance no. 74
in applied optimization. Kluwer Academic Publishers, Dordrecht pp 165181
Gilli M, Schumann E (2010) Optimal enough? J Heuristics forthcoming. http://dx.doi.org/10.1007/
s10732-010-9138-y
Gilli M, Winker P (2009) Heuristic optimization methods in econometrics. In: Beasley D, Kontoghiorghes E
(eds) Handbook of computational econometircs. Wiley, Chichester pp 81120
Goldfarb D, Iyengar G (2003) Robust portfolio selection problems. Math Oper Res 28(1):138
Guastaroba G, Mansini R, MG S (2009) Models and simulations for portfolio rebalancing. Comput Econ
33(3):237262
Kirkpatrick S, Gelatt C, Vecchi M (1983) Optimization by simulated annealing. Science 220(4598):
671680
Konno H, Wijayanayake A (2001) Minimal cost index tracking under nonlinear transaction costs and min-
imal transaction unit constraints. Int J Theor Appl Finan 4(6):939958
123
Lauprete G, Samarov A, Welsch R (2002) Robust portfolio optimization. Metrika 55(2):139149
Maringer D (2005) Portfolio management with heuristic optimization, no. 8 in advances in computational
management science. Springer, Dordrecht
Maringer D, Kellerer H (2003) Optimization of cardinality constrained portfolios with a hybrid local search
algorithm. OR Spectrum 25(4):481495
Maringer D, Winker P (2007) The threshold accepting optimization algorithm in economics and statistics.
In: Kontoghiorghes E, Gatu C(eds) Optimisation, econometric and nancial analysis. Springer, Berlin
pp 107125
Markowitz H (1952) Portfolio selection. J Finan 7(1):7791
Maronna R, Martin D, Yohai V (2006) Robust statistics: theory and methods, probability and statistics.
Wiley, New York, NY
Michaud R (1989) The Markowitz optimization enigma: is optimized optmial? Finan Anal J 45(1):3145
Michaud R (1998) Efcient asset management. Havard Business School Press, Boston, MA
Perret-Gentil C, Victoria-Feser MP (2004) Robust mean-variance portfolio selection. Working paper 173,
National Centre of Competence in Research NCCR FINRISK
Roko I, Gilli M (2008) Using economic and nancial information for stock selection. Comput Manage Sci
5(4):317335
Stubbs R, Vance P (2005) Computing return estimation error matrices for robust optimization. Research
report 001, Axioma, Inc.
Ttnc R, Koenig M (2004) Robust asset allocation. Ann Oper Res 132(14):157187
Welsch R, Zhou X (2007) Application of robust statistics to asset allocation models. REVSTAT Stat J
5(1):97114
Winker P, Lyra M, Sharpe C (2011) Least median of squares estimation by optimization heuristics
with an application to the CAPM and multi factor models. Comput Manage Sci. doi:10.1007/
sc287-009-0103-x
Zymler S, Rustem B, Kuhn D (2009) Robust portfolio optimization with derivative insurance guarantees.
COMISEF working paper 018. http://comisef.eu/?q=working_papers
123

Robust Portfolio Optimization With A Hybrid Heuristic Algorithm

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Robust Portfolio Optimization With A Hybrid Heuristic Algorithm

Uploaded by

Copyright:

Available Formats

Comput Manag Sci (2012) 9:6388

. The weighting parameter [0, 1]

w) and return of the portfolios. Repeatedly

for the true (but unknown) parameters

. Hence, the set of idols is dened by

} (4:). Based on their

is computed, rstly, by building the sum

, varies over the generations and will usually be larger than K

by carrying out the

is dened in such a way that it includes independently for each asset i a

is given by the (/2)100%quan-

, i.e. by the (1 /2)100%

denes the robustly optimized portfolio of the CS-approach.

w from its estimator

is generated by applying the MBB-technique analogously to generating

denes the robustly optimized portfolio in the ECS-approach.

1 0.30 8.11 0.13

1 30.07 1.36 5.65 1.48

1 2.81 7.29 0.17

1 10.55 2.33 8.42 1.72

You might also like