Professional Documents
Culture Documents
A dissertation presented by
to
Complutense University
Madrid, Spain
December, 2011
Complutense University
Madrid, 2011
Legal notice:
It is illegal to reproduce this book, or parts of it, without the expressed written
permission of the author. This can be requested via e-mail at
lopezdeprado@lbl.gov
The views expressed in this publication are the author’s and do not
necessarily reflect those of Tudor Investment Corporation. No investment
decision or particular course of action is recommended by this book.
Aviso legal:
CONTENTS
PREFACE ....................................................................................................... 13
Motivation ................................................................................................... 13
Acknowledgements ..................................................................................... 13
INTRODUCTION ........................................................................................... 17
High Frequency Strategies .......................................................................... 17
Fields of study ............................................................................................. 18
Objectives.................................................................................................... 22
Methodology ............................................................................................... 23
Organization and format.............................................................................. 24
Seminars ...................................................................................................... 25
Collaborations ............................................................................................. 26
MEASURING FLOW TOXICITY ON HIGH FREQUENCY MARKET
DATA .............................................................................................................. 27
1.1. Abstract .......................................................................................... 27
1.2. Introduction .................................................................................... 27
1.3. The Glosten-Milgrom Model ......................................................... 30
1.4. The Easley-O’Hara Model ............................................................. 33
1.5. Estimation of parameters ............................................................... 38
1.5.1. The Nature of Information and Time ............................................. 38
1.5.2. Volume bucketing .......................................................................... 39
1.5.3. Volume classification ..................................................................... 40
1.5.4. Volume-Synchronized Probability of Informed Trading ............... 43
1.6. Accuracy of the VPIN metric estimate .......................................... 47
1.6.1. The approximation ......................................................................... 47
1.6.2. Monte Carlo validation .................................................................. 49
1.7. The stability of VPIN estimates ..................................................... 52
1.7.1. Stability under different Volume Classification schemes .............. 52
1.7.2. Stability to changes in the transaction record ................................. 55
1.8. Estimating the VPIN metric on Futures ......................................... 56
1.8.1. S&P 500 (CME) ............................................................................. 58
1.8.2. EUR/USD (CME) .......................................................................... 59
1.8.3. T-Note (CBOT) .............................................................................. 61
1.8.4. WTI crude oil (NYMEX) ............................................................... 63
1.8.5. Silver (COMEX) ............................................................................ 65
1.8.6. Corn (CBOT) ................................................................................. 68
1.9. Toxicity contagion ......................................................................... 70
1.9.1. Lead-lag toxicity ............................................................................ 73
1.9.2. Propagation across indices ............................................................. 74
1.9.3. Propagation across asset classes..................................................... 76
7
Contents
8
Advances in High Frequency Strategies
9
Contents
10
Advances in High Frequency Strategies
PREFACE
Motivation
My interest in High Frequency Finance began in early 2006, when I started to
trade strategies with shorter holding periods. I quickly realized that many
standard Microestructural, Financial and Econometric theories could not be
exported into a framework where time and information have a different
relationship. I knew the Low Frequency framework well, from my
perspective of both a practitioner and an academic. The literature devoted to
High Frequency Finance at that time was relatively small, fragmented, and in
many cases developed by stretching Low Frequency models, forced to
perform in an environment for which they had not been conceived.
Thus began the work on my second doctoral thesis, which covers topics with
important practical implications for optimal execution, liquidity provision,
market risk, the risk of market failure, … and of course alpha generation by
High Frequency strategies. Most of the profits harvested by High Frequency
trading nowadays can be attributed to speed. But as our trading speed reaches
the limits of Physical feasibility, being fast is no longer enough. Some critics
of High Frequency trading argue that a speed limit should be imposed on
market participants. We have news for them: That has already been taken care
of.
Acknowledgements
Several models presented here were developed during my term as Visiting
Scholar at Cornell University. My collaboration with Prof. Dr. David Easley,
Chairman of the Economics Department, and Prof. Dr. Maureen O’Hara,
former President of the American Finance Association (AFA), resulted in the
development of the VPIN Flow Toxicity Metric, and three patent applications.
13
Advances in High Frequency Strategies
INTRODUCTION
High frequency strategies are of a very diverse nature. We will follow the
general description proposed by Aldridge (2010), defining high frequency
strategies as those characterized by a brief investment horizon, which may
range from a split of a second to several hours. A main advantage comes from
placing numerous independent bets every day on the same instrument or
portfolio, because as the “Fundamental Law of Active Management”
postulates, a tiny predictive power on a sufficiently large number of
independent bets yields a high Information Ratio (Grinold (1989)). The goal
is to exploit the inefficiencies derived from the market’s microstructure, such
as rigidities, agents’ idiosyncrasy, asymmetric information, etc. As a
consequence of this higher frequency, the identification of opportunities, risk
control, execution and other investment management activities must be
automated. Not all algorithmic trading occurs in high frequency, but all high
frequency requires algorithmic trading. This in turn has made it possible to
interact directly with the exchange’s auction mechanism (or “double auction
order book”).
High frequency traders often operate with proprietary capital, meaning that
investors are also investment managers. Their actions are not derived from
client orders but for their own benefit. Their servers reside in the proximity of
the exchange’s matching engine (“co-location”), with the purpose of
17
Introduction
minimizing the time that passes between the shipping of an order and the
arrival of confirmation of reception by the exchange (“latency”). All of the
above demands a considerable investment in terms of infrastructure and of
course the development of trade secrets in the form of algorithms and
quantitative models.
Fields of study
The introduction to each chapter presents a detailed analysis of the state of the
art regarding the subject discussed. Nevertheless, there exists a set of general
and shared themes referred to in multiple occasions across this study that we
find convenient to introduce now for the sake of clarity.
a) Statistics
Modern markets require real time pricing of products and risk management.
Iati (2009) has estimated that over 70% of the volume of U.S. shares is
transacted by high frequency participants. For the year 2010, TABB has
estimated that number to be 60% in the U.S., and around 40% in Europa.
18
Advances in High Frequency Strategies
61%
56%
52%
38%
35%
29%
26%
21% 21%
9%
5%
1%
Ané and Geman (2000) re-discovered an idea of Clark (1973) that allows for
a partial recovery of the properties assumed by most traditional statistical
models on high frequency datasets. Instead of sampling by regular time
19
Introduction
Chapter I will address the same problem as Ané and Geman (2000). We
present a simple sampling procedure that delivers high frequency time series
nearly normal, plus it allows for a reduction of serial correlation and
heteroskedasticity.
b) Economic Analysis
O’Hara (1995) describes the purpose of market microstructure theory and
explains its motivation:
As for this study, we begin with the basic model of sequential trading devised
by Glosten and Milgrom (1985). Easley, Kiefer, O’Hara and Paperman
(1996) much improved that model by recognizing the existence of
participants with asymmetric information under event uncertainty. The
outcome was the celebrated PIN (“Probability of Informed Trading”) model,
which allows market makers to monitor information asymmetry and thus
avoid adverse selection. Although a few PIN estimation procedures exist for
low frequency data (Easley, Engle, O’Hara and Wu (2008), Easley, Kiefer,
O’Hara and Paperman (1996)), there was no possibility of measuring PIN in
the high frequency domain because of the intractability of those datasets.
c) Financial Economics
The statistical and microstructure models developed in this book address a
number of issues treated in the Financial Economics literature. In particular,
our contributions aim to further the understanding of the following research
topics:
20
Advances in High Frequency Strategies
Chapter VI reviews and improves some of the most used methods for hedging
portfolios, plus it introduces two new methods with superior characteristics.
These methods are applicable in a High Frequency space, but also in Low
Frequency. Chapter VII reviews the scientific literature on Sharpe ratio,
offers its projection on the probabilistic space (Probabilistic Sharpe ratio),
and develops a new methodology which determines the minimum track
record length required to evidence investment skill at a preset confidence
level. High Frequency returns are non-Gaussian, so a new Sharpe Ratio
Efficient Frontier is needed to address this feature. Chapter VIII introduces
the EF3M algorithm, with important applications in the modeling of financial
data. This new estimation procedure is particularly useful in Financial
21
Introduction
Objectives
Cahan et al. (2010) argue that a large portion of the financial literature has
devoted itself to measuring low frequency risks, to some extent ignoring the
specific risks associated with high frequency trading. In particular, those
authors call for research that may uncover the relationships between risks in
both domains.
The sixth objective is to study the optimality of some of the most popular
portfolio hedging methods. Solving the problem of portfolio hedging has
critical applications in high frequency, not only for risk management but also
in trading strategies. This study presents two new methodologies that
overcome some of the caveats found in previous methods: DFO and MMSC.
22
Advances in High Frequency Strategies
Methodology
We will employ an assortment of techniques drawn from mathematical and
statistical analysis to answer questions derived from this confluence of areas
(Finance, Statistics, Economic Analysis). In particular:
23
Introduction
24
Advances in High Frequency Strategies
Seminars
A number of the models included in this work have been presented in
international seminars:
1
Art. 4.3 of UCM’s Directive of 12/02/2010, developing the Royal Decree 1393/2007
of October 29th (B.O.E. 30/10/2007).
25
Chapter I: Measuring Flow Toxicity
The dynamics of the order books are interrelated across multiple products.
Figure 32 illustrates how, in order to decide at what level to place a client’s
order on Eurodollar short futures, Quantitative Brokers’ algorithms analyze 6
different relationships in real time, searching for hidden liquidity (liquidity
that, although is not displayed in that particular book, is implied by the
liquidity present in the related books). Consequently, in order to operate on
72
Advances in High Frequency Strategies
The system is composed of two equations. The first one forecasts the absolute
return over the next volume bucket as a function of the last absolute return
and reading of log VPIN. The second equation forecasts the log VPIN as a
function of the same variables as in the first equation. Evidently, the second
equation is redundant for a unit forecasting horizon k, but as k>1 the second
equation allows us to incorporate in the first one the feedback mechanism.
p1 t 1 1,1 p1 t 1, 2 p 2 t
p 2 t 1 2 ,1 p1 t 2 , 2 p 2 t
Pt
with initial conditions p1 0 , p 2 0 , where p 1 t 1 and
Pt 1
p 2 t Ln VPIN t
.39
p 1 t 1
.
2 ,1 2 , 2
The key role played by the second equation can be illustrated with an
example. Suppose that market makers are being impacted by excessive flow
toxicity. As they widen their trading ranges, volatility increases. Those
market makers who were ‘slow’ in adjusting will suffer losses and vanish,
which will force the ‘fast’ market makers to re-adjust. Some of those ‘fast’
market makers may miss this re-adjustment, being driven away in a second
wave of losses, and so on. At some point, the chain of events may be broken
(e.g., by new market makers making their appearance, attracted by wide
spreads), or lead to an ‘explosive’ state. The feedback dynamics do not affect
volatility over the next bucket, but it adds information over the long run,
explaining the mechanism by which high frequency risks spill over the low
frequency domain. A univariate or single-equation specification fails to
provide an explanation for such spillover.
39
Centering these variables introduces an intercept, if so desired.
95
Chapter II: Toxicity as a source of Volatility
1,1 1, 2
0 1,1 2 , 2 1, 2 2 ,1 0 , a second
2 ,1 2,2
Tr Tr 4
2
determinant. Thus,
.
0 2
1,1 1 1, 2
w1,1
0
1, 2
1
row operations will yield
1 ,1 1 , and so we reduce the system to:
0 1
1, 2
w 1 ,1 w 2 ,1 0
1,1 1
w 2 ,1 1
W .
1 ,1 1 1 ,1 2
1 1
Let’s write p 0 W c , where c is, like before, the column vector that solves
W for the initial conditions p 0 . Assuming that β has all independent
96
Advances in High Frequency Strategies
1
eigenvectors, we know that β is diagonalizable and W W . Then,
p 1 p 0 W W
1
W c W c . Multiplying k times by β will yield
p k p 0 W c ,
k k
or what is the same,
p1 k
w1,1
w1, 2
p k c11 c22
k k
.
p 2 k w 2 ,1 w2,2
w1,1
w1, 2
1
w 1 ,1
w 1, 2
c1 1 c22
k k
P t k 1 , where:
Ln VPIN w 2 ,1 w 2,2
tk
1, 2 1, 2
w 1 ,1 w 1, 2
1 1,1 2 .
w 2 ,1 w 2 , 2 1 ,1
1 1
Tr Tr 4 Tr Tr 4
2 2
1 ; 2 ,
2 2
where Tr is the trace of and its determinant.
P1
1 Ln VPIN 0
w 1,1
P0
c 1 Ln VPIN 0
c2 ; c2 .
w 1, 2 w 1,1
97
Chapter II: Toxicity as a source of Volatility
dp 1 t
1 ,1 p 1 t 1 , 2 p 2 t
dt , with initial conditions p1 0 , p 2 0 where
dp 2 t
2 ,1 p 1 t 2 , 2 p 2 t
dt
Pt
p 1 t 1 and p 2 t Ln VPIN t
.
Pt 1
dp 1 t
dp t dp t
Its matrix representation is p t , where dt ,
dt dt dp t
2
dt
1,1 1, 2
p1 t
, p t
.
2 ,1 2 , 2 p 2 t
40
A non-necessary but sufficient condition is having all different eigenvalues.
98
Advances in High Frequency Strategies
Tr Tr 4
2
0
1, 2 1, 2
Tr Tr 4
2
and 2 , as well as W
1 ,1 2 .
2 1 ,1 1
1 1
p t c1e
1t 2t
c2e , all of which has been previously derived
w 2 ,1 w2,2
c1
1, 2 1, 2
p 0
1
1
c
.
1 1 ,1 2
1 ,1 c 2 p 2 0
1 1
p1 0 p 2 0 w1 ,1
Finally, c1 p 2 0 c 2 , c 2 .
w 1 , 2 w 1 ,1
Note how similar the solution to the system in differences is to the solution of
the system of differential equations. However, this similarity is only in
structure, because β (and therefore also Λ, W and c) will have different values.
dp t
41
For a simple proof, see that p t can be rewritten in terms of its pure
dt
dp t
w1,1
w1,1
and p t e
1t 1t
solutions. For example, 1 e , which are
dt w 2 ,1 w 2 ,1
w
w
one and the same since we know that 1 1,1 1,1 from linear algebra.
w 2 ,1 w 2 ,1
99
Chapter II: Toxicity as a source of Volatility
p t c1e
1t 2t
c2e , we ought to distinguish among three
w 2 ,1 w 2 , 2
alternative outcomes in continuous time:
Stable state: Both eigenvalues are negative. If imaginary eigenvalues
exist, their real part must be negative.
Steady state: If at least one eigenvalue is null and the other is
negative (or their real part, if imaginary).
Explosive state: If any eigenvalue is positive (or their real part, being
imaginary).
Let’s denote τ the bucket at which a fit has occurred. As discussed, we can
compute the k=1,…,50 forecasting errors that follow our fit at bucket τ as:
() = − 1 −
− 1
100
Advances in High Frequency Strategies
∆
= 2
(39)
() 1
()
()
= − ()
ω 8
ω
ω
(40)
()
= 2( )
ω
(41)
'
%&
ω
∆,() 1 − ∆,()
= *
ω +−2
.
1 1 − ∆,()
+ ∆,() /
(+ − 2) +−2
.
∆,() 1 − ∆,()
∆,() (43)
+ * ∆,()
ω +−2
ω
0
3 1 − ∆,()
∆,()
+* 2∆,() / ∆,()
(+ − 2) +−2
ω
.
2
∆,() 1 − ∆,()
+ ∆,() /
+−2
ω +−2
187
Advances in High Frequency Strategies
BIBLIOGRAPHY
267
Bibliography
268
Advances in High Frequency Strategies
269
Bibliography
270
Advances in High Frequency Strategies
271
Bibliography
272
Advances in High Frequency Strategies
Lee, C.M.C. and M.J. Ready (1991): “Inferring trade direction from
intraday data”, The Journal of Finance, 46, 733-746.
Littermann, R. and J. Sheinkman (1991): “Commons Factors
Affecting Bond Returns”, Journal of Fixed Income, June, pp 54-61.
Lo, A. (2002): “The Statistics of Sharpe Ratios”, Financial Analysts
Journal, 36-52.
López de Prado, M., D. Easley and M. O’Hara (2011a): “The
Exchange of Flow Toxicity”, Journal of Trading (forthcoming,
Spring 2012).
López de Prado, M. (2011b): “Exact Fit for a Mixture of 2
Gaussians: The EF3M algorithm”, Cornell University, Johnson
School Research Paper Series (39).
López de Prado, M. (2011c): “Advances in Cointegration and Subset
Correlation Hedging Methods”, Cornell University, Johnson School
Research Paper Series (40).
López de Prado, M., D. Easley and M. O’Hara (2010a): “Flow
Toxicity and Volatility in a High Frequency World”, Review of
Financial Studies (forthcoming).
López de Prado, M., D. Easley and M. O’Hara (2010b): “The
Microstructure of the Flash Crash: Flow toxicity, liquidity crashes
and the Probability of Informed Trading”, Journal of Portfolio
Management (forthcoming, Winter 2011).
López de Prado, M. (2008): “The Sharpe ratio Efficient Frontier”.
RCC at Harvard University, http://ssrn.com/abstract=1821643
López de Prado, M. and C. Rodrigo Illera (2004): “Invertir en hedge
funds: Análisis de su estructura, estrategias y eficiencia”, Ed. Díaz
de Santos, Madrid.
López de Prado, M. and A. Peijan (2004): “Measuring Loss Potential
of Hedge Fund Strategies”, Journal of Alternative Investments,
Vol.7 (1), pp.7-31.
López de Prado, M. (2003): “Die Unterschätzte Risiken von Hedge
Funds", Stocks, n° 29/30, pp.18-30, July.
Lord, R. and A. Pelsser (2007): “Level-slope-curvature: Fact or
artifact?”, Applied Mathematical Finance, Vol. 14, No. 2, May, pp.
105-130.
Low, A. and J. Muthuswamy (1996): “Information flows in high
frequency Exchange Rates”, en C. Dunis (ed.), Forecasting Financial
Markets, Wiley.
Luo, Y., R. Cahan, J. Jussa, Z. Chen and M. Álvarez (2011):
“Minimum Variance: Exposing the ‘magic’”, Global Markets
Quantitative Research, Deutsche Bank.
Maddala, G.S. and I.M. Kim (2004): “Unit roots, cointegration and
structural change”, Cambridge.
Madhavan, A. (2002): “VWAP Strategies”, ITG, Working paper
(Spring).
273
Bibliography
274
Advances in High Frequency Strategies
275
Bibliography
276
Advances in High Frequency Strategies
This dissertation is the author’s second doctoral thesis. The first one dealt
with Hedge Funds’ risk management and portfolio construction, and was
published in 2003.112
Tudor Investment Corporation has been ranked among the top-3 most
profitable hedge funds in history. 113 Tudor’s funds manage in excess of 12
billion dollars, of which about 1/3 is allocated to Systems Trading. Tudor’s
High Frequency group is comprised by leading researchers and developers
coming from the fields of Physics, Mathematics, Engineering, Computer
Science and Economics.
112
Official doctoral record: https://educacion.gob.es/teseo/mostrarRef.do?ref=295773
113
“Hedge fund report card”, Absolute Return Magazine, September 2009.
277
Advances in High Frequency Strategies
ABOUT COMPLUTENSE
By the year 1509 it already had five major Colleges: Medicine, Philology,
Arts and Philosophy, Theology and Canon Law. One of its alumni, Cardinal
Cisneros, attracted many of the world’s foremost linguists and biblical
scholars. The Complutensian Polyglot Bible (1514), one of the greatest
academic works of the Renaissance, was the result of 15 years of
interdisciplinary research. For instance, the Greek typefaces devised for this 6
volume tractatus constituted the template for the Greek fonts used nowadays
(Otter Greek, GFS Complutensian Greek). This publication predates
Erasmus’ Textus Receptus (1516), which later became the basis for Oxford
University’s King James version (1611).
279
About Complutense
280