01 June 2015 Quantitative

Currency Strategy

How to Create a Surprise Index

And, more importantly, what to avoid doing

What we do and why we do it this way

When creating a surprise index, there are two main questions which need to be
considered. First, how do you quantify how surprising an individual data release is?
Second, how can you aggregate the individual surprises into a surprise index?

The methodology we use for our surprise indices is deceptively simple (see box below).
Many of the choices we describe in this piece might appear to be dry and esoteric details;
however, they can have a profound influence on the results.
What is a surprise index?
There are a variety of economic indicators published for major economies. In terms of the absolute
performance of the economy it is the levels of the indicators themselves which are most relevant.
However, from the perspective of markets it is really the degree to which the released value differs from
expectations which is most important. It is this news, or ‘surprise’, component that is being analysed in a
surprise index.

Constructing a surprise index involves two main problems. First, how do you correctly measure the
surprise component of an individual data release? Second, how do you aggregate the individual data
surprises into an index? However, before addressing either of these questions we have a more
fundamental consideration: what is the core purpose of a surprise index?

What are we actually trying to measure?

Before we can sensibly decide on how to create a surprise index, we first need to be clear about what,
precisely, we are trying to measure with the index. We believe that the core purpose of an economic
surprise index is to measure the discrepancy between market sentiment and reality. This core
purpose has a major influence on the way in which we have chosen to construct our surprise indices.

It is worth stressing that, for our purpose, this discrepancy between sentiment and reality is the main useful
information contained within data releases. It is common for market participants to have a rather ad-hoc
approach to keeping track of economic data releases. As a result, their perception of the recent trend of data
surprises is likely to be dominated by the major releases such as payrolls and GDP and, thus, subtle biases in
expectations can persist for some time. A properly-constructed surprise index is able to illuminate the true
signal by aggregating across many different data releases. As we will explain in detail later in this piece, this is
why we specifically do not weight different data releases by market-moving performance.

How do we calculate surprise?

To quantify the surprise of a data release raises the question – how do you know what was expected?
Many institutions produce forecasts of myriad economic data releases. Whilst each of these forecasts may
individually be quite biased and perhaps of dubious value on their own, there is a ‘wisdom of crowds’
argument which suggests that a central measure of these forecasts will, on average, be a good estimate of
the market expectation. A median value of the disparate estimates rather than a simple average is
generally used for this expectation so as to avoid the result being skewed by any particularly unrealistic
outlying forecasts.

Early attempts to measure economic surprise simply counted the number of releases which had been better or
worse than expected. These early definitions have an obvious flaw: clearly a +700k surprise on payrolls one
month is not offset by a -50k surprise one month later. In other words, the degree to which the data has
surprised is important. However, comparing the raw surprises from different releases will not work because the
different economic indicators are measured in very different units. Some are measured in number of people,
some are measured in USD, and others are given as dimensionless quantities such as percentages.

So that we can meaningfully compare the surprises from different economic indicators we need to
standardize the surprises. There is now wide acceptance of this general methodology; however, there are
differences of opinion as to the specifics of how this should be done.

So, what do we do?

We define the raw surprise to be the difference between the observed value and the Bloomberg survey
median, appropriately signed so that a positive number means ‘better than expected’. We create the
standardized surprises by dividing this raw surprise by the standard deviation of all previously observed
surprises. Of course, what we would really like to use to standardize the raw surprises is the instantaneous
standard deviation; sadly it is not possible to know this. We estimate it by using an ever-growing window;
an alternative choice would be to use a rolling window to calculate it. A rolling window initially seems
quite an appealing procedure since the estimation error must change over time for at least some economic
indicators. However, since sampling errors are huge for small samples we opt for the more parsimonious
choice of using the entire history1. This is in keeping with our views on unnecessary parameter choices in
section ‘Don’t overcomplicate it’.

Note that we do not subtract the sample mean when calculating the standardized surprise; we only divide
by the standard deviation. This might appear to be an esoteric detail; however, this is actually a crucial
and deliberate choice. If one were to subtract the sample mean when calculating the standardized
surprises, you may end up inadvertently turning a negative surprise (‘bad news’) into a positive surprise
(‘good news’). See the discussion of the G7 Industrial Production Surprise Index in the final section.

Which data should we include?

The answer to this question follows naturally from the core purpose of the surprise indices: we should
include as many economic data releases as possible. The limiting factor for inclusion in the index is that
we need the data release to have a long-enough history of forecasts where the median forecast provides a
good estimate of the market expectation.

An associated question is whether it makes sense to produce a surprise index for a specific data release
(or group of similar data releases). In general, the broadest possible measure of economic activity surprise
is preferable; however, there can be circumstances where such sub-indices can make sense. (See the final
section for an example.)

1 N.B. We do not use any future data in the estimation of the standard deviation; we use an estimation window which grows over time.

How should I weight the different data series?

Here again the answer to the question stems from the core purpose of the indices. Since we are trying to
measure the discrepancy between market sentiment and reality we consider the comparison between any
data release and the market expectation immediately prior to the announcement to be equally important.
Therefore, we equally weight the surprises when constructing our index.

At first glance this may appear to be quite a controversial methodology. As market participants it feels
‘obvious’ that a surprise in non-farm payrolls or GDP is ‘more important’ than a surprise in, say, personal
spending. This is indeed the case when one is considering the immediate market-moving impact of the
surprises from these economic indicators. However, all of these releases have the same ability to shine a
light on any discrepancy between expectations and reality2. This is why we do not assign different
weights to different releases. We discuss this in detail in the following section ‘Don’t weight up’.

How should I aggregate the data into an index?

In contrast to the previous questions whose answers followed from the specific task at hand, the answer to
this question is quite general: the correct aggregation method is as simple as possible. Financial research
seems to be infested with the temptation to add specious complications to charts. In contrast, we resist the
siren songs of filters, moving averages, z-scores, and fractals; instead we simply create an index which
aggregates the total of the standardized surprises on the day and cumulates these values over time.

This methodology means that the index value at a specific point in time does not mean anything in and of
itself. Rather, it is the direction the index is moving which is important. Specifically, if the surprise index
is moving upwards this indicates that, on average, economic data has been coming in better than
expected; equivalently, the expectations of the market have been too negative.

Structure of the remainder of the piece

In the next section, ‘Don’t weight up’, we discuss the reasons why creating a weighted surprise index is a
bad idea. The following section, ‘Don’t overcomplicate it’, is the rallying cry for minimalism in index
construction. The final section, ‘Example’, uses a specific example to clearly illustrate the importance of
some of our construction methodology choices.

2 Indeed, one could argue that surprises in minor data releases can be even more important since large surprises in important releases (such as
payrolls or GDP) will be noticed by the market and may prompt a revision of expectations whereas surprises in minor releases can be overlooked for
some time.

Don’t weight up
Why we don’t weight the releases
As explained in the previous section, we deliberately do not weight the constituent data surprises
according to the importance of the release. The rationale for this has already been explained. However, to
dispel any doubt, in this section we create such a weighted surprise index in order to demonstrate why it
is a bad idea.

How do we define ‘importance’ exactly?

As FX researchers it might seem natural to define the importance of a data series by its impact on
currency markets. However, the relationship between economic data and currencies is poor. Sometimes
better-than-expected US data is good for the USD, whereas sometimes it is bad. As a result, the overall
relationship ends up being quite messy and unreliable. In contrast, the relationship between data surprises
and yields is consistent: better than expected activity data are reliably associated with higher yields.

To determine the weights we look at the relationship between the standardized surprises for an economic
indicator and the percentage change in 2-year US swap rates over the day. We fit a linear regression to
these data points and note the beta of 2-year rates to the standardized surprises. If this beta is negative, we
set the weight of this indicator to be zero. For those releases where the beta is positive, regardless of
whether the beta has a significantly non-zero t-stat, we define the weight for the release to be proportional
to the beta, such that the weights sum to one. These weights are shown in chart 1.

1. Some releases are far more strongly market-moving than others

30% 30%
25% 25%
20% 20%
15% 15%
10% 10%
5% 5%
0% 0%
ISM Non-Manuf acturin g

Trade Balan ce
Initial Jobless Claims

Consume r Credit
Advan ced GDP

Exist ing Home S ales

Dura bles Ex Transp ort ation

Consumer Conf id ence

Philadelphia Fed.
ISM Man ufacturing

Retail Sales Less Autos

Construction Spe nding

Dura ble Goods Orders

Personal In come
Chicago Purchasing Manager

New Home Sales

Housing Sta rts
Average Weekly Ho urs

Personal Sp ending
Ind ustrial Product ion

Unemp lo yment Rate

Change in Nonfarm P ayrolls

Source: HSBC, Bloomberg

It is clear that these weights are quite unbalanced. There are several data releases which only get a tiny
weight in this new index, whereas non-farm payrolls has almost 30% weight. The problem with this
weighting procedure becomes clear when we consider chart 2.

2. A weighted index tells you little you didn’t already know

Weighted Surprise Index Just Payrolls and GDP Surprise Index
2 2

0 0

-2 -2

-4 -4

-6 -6

-8 -8

-10 -10

-12 -12
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Source: HSBC, Bloomberg

The red line shows the weighted surprise index and the black line shows a surprise index created just
from the top 2 weighted data releases (payrolls and GDP, weighted in the same proportion as in the red
line). The two lines are incredibly similar – highlighting the needlessness of building an index in this
manner. If you disagree with us and believe that a surprise index should be constructed in this way, you
may as well save yourself the trouble of building a fully weighted index and simply construct a simple
index from a much smaller universe of ‘important’ data releases. Furthermore, we would argue quite
strongly that doing this would miss the important information contained within the surprise data. It is
precisely the lack of attention paid to the less high-profile releases which makes them so useful in
surprise indices: Their low importance makes it less likely that people will adjust their expectations in
response to the surprises, making it more likely that the bias you have identified will persist.

In chart 3 we compare recent values of both our regular, unweighted index (black line) and the weighted
surprise index (red line). The dizzying fall in the unweighted surprise index highlights the fact that the
market has been far too optimistic on US activity recently. This story only becomes clear much more
slowly with the weighted index. So calculating a surprise index where the surprises are weighted by
market-importance is not only a waste of time, it is also likely to obscure the true story.

3. The unweighted index responds more quickly

Weighted Surprise Index Unweighted Surprise Index (RHS)
-8.5 10
-10 -15
-11.5 -40
Mar-14 May-14 Jul-14 Sep-14 Nov-14 Jan-15 Mar-15 May-15

Source: HSBC, Bloomberg

Don’t overcomplicate it
The beauty of simplicity
Simplicity is a much-underrated property in financial market research. Human nature being what it is,
there is an almost irresistible tendency to think that if a simple measurement is good then a more
complicated measurement must be even better. This seductive line of reasoning, however, is often bogus.

The main danger with introducing unnecessary complication into a surprise index is that you end up
introducing time structure into the index which is not present in the underlying surprises. This raises the
possibility that someone will use the index data to draw conclusions which are not valid. In this section we
cover a couple of the usual suspects in detail. Rather than just explaining the reasons why these approaches are
misguided, we actually implement these ideas to demonstrate their unintended consequences.

Z-Scores of indices are bad

A simply-constructed surprise index contains all the information you need, but it does not produce ‘nice’
bounded values. Since our index aggregates individual standardized surprises, the current value is not
important; rather, the information comes from how the index has changed over some time-period you are
interested in. Specifically, a positive trend indicates a run of ‘good news’ and vice-versa.

4. The z-scored index gives the illusion of mean-reversion

US Activit y Surprise Z-Scored Version (RHS)

50 4
40 3
30 2
-30 -3

-40 -4
-50 -5
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Source: HSBC, Bloomberg

It is an understandable impulse to want to transform the raw index so that the information you want to
know can be gleaned from just knowing the current index value. Specifically, people want to see an index
which oscillates around zero such that positive values mean ‘good news’ and negative values mean ‘bad

news’. The obvious choice for this process is to z-score3 the index. In chart 4 we compare a z-scored
version of our US activity surprise index (in black) to the untransformed index (in red). On first
comparison this appears to be a useful procedure: whereas our untransformed index moves in an
unconstrained fashion, this z-scored index oscillates around zero and rarely moves beyond +/- 3.

The flaw in this z-scoring procedure is that it has introduced structure into the time series which is not
present in the underlying surprise data. To really highlight the problem, we will focus in on a shorter time
period around the early part of the financial crisis.

5. US economic activity surprises during the financial crisis

US Activit y Surprise
20 20

10 10

0 0

-10 -10

-20 -20

-30 -30

-40 -40

-50 -50
Jan-08 Mar-08 May-08 Jul-08 Sep-08 Nov-08 Jan-09 Mar-09 May-09 Jul-09 Sep-09 Nov-09
Source: HSBC, Bloomberg

The untransformed US Activity Surprise Index tells a fascinating story in chart 5. From September 2008
onwards the index fell dramatically as US data came in worse than expected. Towards the end of 2008,
however, there was clearly a re-rating of expectations by the market as the index begins to fall less
vertiginously. The important point here though is that despite the market revising expectations down, they
were still not negative enough: the US Activity Surprise Index continued to fall for a further three months.
Finally, in March 2009 expectations had been revised down so far that US data, despite still being quite terrible
in an absolute sense, managed to significantly outperform the apocalyptic expectations.

This instructive narrative is completely missing from z-scored versions of the index. In chart 6 we compare z-
scored versions of the index using three different window parameters (6-month, 1-year, and
2-year). The time-series structure we introduce through our arbitrary choice window completely overwhelms
the true signal at one of the most critical moments in financial history. Since these are
z-scored indices, remember that it is that index value rather than direction of movement which is important.
These three indices disagree significantly as to the point at which the surprise data turns around. The index
calculated using a 6-month window becomes positive in late-April 2009, the 1-year window in July and the 2-
year window only significantly breaks through zero in December. These differences are particularly important
given the dramatic nature of financial markets and the economy over that time period.

3 To create a z-score, the current index value is standardized by subtracting the sample mean and dividing the result by the sample standard deviation. Note that
this requires an arbitrary choice of window over which to calculate the means and variances. Furthermore, since the underlying surprise index values are most
definitely not mean-stationary, one has to be dubious as to whether the ‘sample mean’ actually means anything useful in this case.

6. Different (z-score) windows show different pictures of the economy

Z-score (6-month window) Z-score (1-year window) Z-score (2-year window)
4 4

3 3

2 2

1 1

0 0

-1 -1

-2 -2

-3 -3

-4 -4
Jan-08 Mar-08 May-08 Jul-08 Sep-08 Nov-08 Jan-09 Mar-09 May-09 Jul-09 Sep-09 Nov-09

Source: HSBC, Bloomberg

To further illustrate the degree to which z-scored indices miss the mark, we compare our US Activity
Surprise Index to the S&P 500 in chart 7. The final overly-aggressive mark down in expectations which
our US Activity Surprise Index captures is, not surprisingly, coincident with the wholesale capitulation
which marked the nadir of the equity market. This is the point at which data started beating expectations –
critical information which is obscured in the z-scored indices.

7. The HSBC US Activity Surprise Index captures the change in sentiment

SPX Index HSBC US Activity Surprise Index (RHS)
1500 20
1100 -10

1000 -20
600 -50
Jan-08 Mar-08 May-08 Jul-08 Sep-08 Nov-08 Jan-09 Mar-09 May-09 Jul-09 Sep-09 Nov-09

Source: HSBC, Bloomberg

The exact choice of window parameter used for the z-score will unavoidably introduce structure into your
index. This structure is an artefact of the construction methodology and, unless the underlying surprise
data really does have the same natural time-frame as your choice of window parameter, is likely to end up
obscuring the signal you are trying to measure. For this reason we publish the unadulterated,
untransformed index; anyone who believes they know the true natural timeframe over which data
surprises mean-revert is able to take our data and transform it as they see fit.

Moving windows are bad

We have just seen how the arbitrary choice of window length used to z-score the index introduced phantom
structure which overwhelmed the signal we were trying to measure. It should come as no surprise that defining
the surprise index to be the total surprise over an arbitrary choice of window suffers from the same problem. In
chart 8 we compare our unadulterated US Activity Surprise Index to one calculated as the total4 standardized
surprise over a 3-month window. As with the z-scored indices discussed earlier, the signal here is difficult to
separate out from the structure introduced by our window choice.

8. Window choice introduces structure which can dominate the results

Normal Surprise Index Surprise over 65-day window
50 50
40 40
30 30
20 20
10 10
0 0
-10 -10
-20 -20
-30 -30
-40 -40
-50 -50
Jan-08 Apr-08 Jul-08 Oct-08 Jan-09 Apr-09 Jul-09 Oct-09 Jan-10

Source: HSBC, Bloomberg

The fact that the choice of window length is arbitrary is a big problem. If we calculated the index in
this way and you disagreed with our choice of window then there would be no way for you to back this
out. To take a market-based allegory, if you were interested in analysing the performance of EUR-USD,
the dataset you would like to begin with would be the raw price series.

You might for some reason believe that the natural timeframe of EUR-USD was, say, 1-week. Someone
else might instead believe that the ‘correct’ choice was 3-months. A third person might instead choose to
analyse the data in a discretionary manner. From the raw price series, all these options are easy. In
contrast, if your market data provider had (perversely) decided to provide you only with the rolling 1-year
return in EUR-USD, performing your chosen analysis becomes very challenging.

In the case of a price series this sort of behaviour would be so bizarre that the situation just described is patently
absurd. However, this is exactly the situation you are being put in when a research institution provides you with
an index constructed with one of the myriad complications we are railing against in this section.

We do not believe that we should force a parameter choice on our readers if it can be avoided. Indeed, there
may well be no natural time-frame over which expectations are revised. It is quite likely that it is events that
cause the re-rating of expectations, whether those events be political, economic, or market-driven. If this is
true, the point at which the discrepancy between market expectations and reality is corrected is best analysed
in a discretionary manner and the surprise indices do not have a natural timeframe.
4 Clearly, defining the index as the average surprise over a moving window would suffer from the same problem.

EWMA are also bad

Rather than using a simple average calculated over a moving window, you might prefer to use Exponentially
Weighted Moving Averages (EWMA). EWMA have the nice property that they weight more recent data more
highly; this can be useful in some situations. Unfortunately, using EWMAs does not avoid the problems with
moving averages: you still need to specify an arbitrary choice of parameter, and the process introduces
structure into your time series which may well obscure the very information you are trying to identify. In effect,
EWMAs behave very much like a simple moving average with a short window.

Minimalism and the art of index construction

The core message from this section is that even very simple adjustments to the data construction will
introduce structure to your index which can overwhelm the very signal you are trying to measure.
Therefore, it should come as no surprise that we most definitely avoid further, more complicated
data transformations.

The universe of possible, and superficially plausible, data transformations is limited only by your
imagination. In this section we showed how just one simple window parameter was able to introduce
phantom structure strong enough to dominate the true signal. By expanding the parameter space you
consider you will dramatically increase the probability of finding a data transformation which spuriously
appears to improve the historical ‘performance’ of the index. The chances of this improvement continuing
out of sample are obviously quite slim.

Industrial Production – long-term bias in market expectations
9. The triumph of hope over experience
G7 Industrial Production Surprise
0 0

-50 -50

-100 -100

-150 -150

-200 -200

-250 -250

-300 -300
2001 2003 2005 2007 2009 2011 2013 2015

Source: HSBC, Bloomberg

The previous sections have gone into reasonable depth about dry technical details. In this section we wrap
up the piece with an example which highlights many of the points we have made and illustrates the
benefit of our index construction methodology.

The wisdom of crowds?

As well as the more typical surprise indices which are designed to identify bias about a particular
economy, we also produce indices which are designed to identify situations where the market’s view on a
particular piece of economic data is biased. In chart 9 we show the G7 Industrial Production Surprise
Index. Here we aggregate the equally-weighted standardized surprises of Industrial Production from the
following countries: USA, Japan, UK, Germany, France, and Italy.

The chart tells a striking story: from 2001 the index has been steadily decreasing. This tells us that the
market has been consistently over-optimistic about industrial production in major economies for well over
a decade. This is highly surprising and raises serious doubts about the wisdom of crowds, at least in this
case. Normally one would expect a very long period of data underperforming expectations to lead to the
market revising expectations down.

This example is a useful illustration of many of the points we have made in this piece:

A focused sub-index
In the first section of this document we mentioned that it can make sense to create sub-indices which
focus on a specific data release or section of the economy. This is exactly such a situation: there has not

been an equivalent bias in the economy-wide surprise indices for any of the constituent countries so we
have found something with this sub-index which would otherwise have been overlooked.

Equally-weighting when constructing the indices

If the constituent surprises were weighted by some measure of the economic importance of the country
(perhaps GDP-weighted) then one would worry that the trend observed in the index was being dominated
by the results from the ‘most-important’ country. With our equally-weighted indices we can be confident
that this is a widespread bias in the expectation for industrial production in major economies.

Not subtracting the sample mean

On page 3 we stated that, far from being an esoteric detail, not subtracting the sample mean when creating
standardised surprises was a deliberate decision5. This industrial production surprise index is a perfect
example of why we create the indices in this way. If we were to remove the sample mean then this long-
term bias would be missed entirely. Indeed, if we had removed the sample mean when creating our
standardized surprises, there would be many periods when the index would be rising simply because the
degree of bias was smaller than the long-term average. Clearly, interpreting these periods as being ‘better
than expected’ would be wildly inappropriate. The sign of a surprise should not be changed: better than
expected data is always good news and worse than expected data is always bad news.

Keeping it simple
On a related note, this situation also highlights why we keep the index construction methodology as simple as
possible. Whilst unlikely to be as distorting as removing the sample mean from the constituent surprises, the
time-structure introduced by any unnecessary complications is only going to obscure the big story here.

The ability of this estimation bias to survive for so long and in such a stable manner is probably the result
of a combination of two factors: one structural and one behavioural.

The structural rationale for the error is that it is likely that very few people are paying close attention to
the entirety of the data being estimated here. Within an institution it is likely that different people are
responsible for creating the industrial production forecasts for different countries. It is therefore highly
unlikely that the people responsible for the forecasts are paying particularly close attention to the
aggregate long-term trend of surprises in industrial production across all these economies.

The behavioural rationale is that this bias has occurred over a period during which manufacturing in
major economies has steadily lost ground to emerging markets. It is possible that the market has had a
tendency to underestimate the strength and duration of this process.

5 By not removing the sample mean when creating standardized surprises we are, in effect, asserting that the long-term average surprise should be
zero; by default we assume that if forecasters are behaving rationally they would not, in aggregate, be biased. This is a sensible default position and,
ultimately, one which is vindicated by the majority of the evidence: The long-term bias observed in the industrial production surprise index is notable
by its rarity.

Surprise indices can be a useful tool for macro trading. Sadly, many are constructed with well-meaning,
but ultimately harmful, methodology. In this piece we have highlighted and justified the various choices
we have made when constructing our surprise indices.

Many of the choices we have made follow naturally from clear thinking about what we are trying to
measure with our surprise indices. Some of these methodological choices are deceptively simple; as we
have shown in this piece, the implications of some of these subtle details can be quite profound. In
particular, not weighting the releases according to their market-moving influence is of critical importance.

In addition to the choices which are, in effect, forced upon us by the precise nature of the task at hand, we
also have detailed many choices which we have deliberated eschewed so as to avoid introducing artefacts
into the resulting index. We believe that this approach is widely applicable – minimalism is a desirable
feature of index construction. In the aphoristic words usually attributed to Einstein: ‘Everything should be
made as simple as possible, but no simpler’.

