Professional Documents
Culture Documents
Advanced Econometrics I
Chapters 7 and 8
Francisco Blasques
These lecture notes contain the material covered in the master course
Advanced Econometrics I. Further study material can be found in the
lecture slides and the many references cited throughout the text.
Contents
7 Model Selection and Pseudo-True Parameters
7.1 Least squares and the weighted L2 -norm . . . . . . . . .
7.2 Maximum likelihood and the Kullback-Leibler divergence
7.3 Model selection . . . . . . . . . . . . . . . . . . . . . . .
7.4 Inference under Model Misspecification . . . . . . . . . .
7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
3
3
5
7
9
11
13
13
17
22
. .
.
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
In this chapter, we will try to answer probably the most essential question of all: what
exactly are we estimating?
Until now, we have always defined the parameter of interest 0 , simply as being
the identifiably unique maximizer of the limit criterion function Q . In Chapter
T will
5 we have shown that, under appropriate conditions, extremum estimators
converge to the point 0 , which is the unique maximizer of Q over the parameter
space. In Chapter 6, we have further shown that the distribution of the extremum
T is asymptotically Gaussian and centered at 0 .
estimator
The definition of 0 as the unique maximizer of Q might seem rather weak and
uninteresting, but it is not. On the contrary, it is worth repeating it: 0 is the
unique maximizer of Q . In most cases, being the unique maximizer of Q is a very
meaningful statement in itself. For example, in maximum likelihood estimation, this
means that 0 is really the most likely parameter value given an infinite amount of
information (i.e. an infinitely large sample). In least squares estimation, 0 is the
parameter value that gives the best fit as judged by the sum of squared residuals
from an infinite number of observations. In method of moments estimation, 0 is the
parameter value that best matches the moments of the data as judged by an infinite
sample. This is, in itself, something important!
Definition 1 (Pseudo-true parameter) Given an extremum estimator
T arg max QT (xT , )
the pseudo-true parameter 0 is the unique element of that maximizes the limit
criterion Q over
0 arg max Q ().
7.1
tZ
xt (xt1 , ) .
T arg min
T
t=2
As it turns out, the quantity that 0 is minimizing above is simply the transformation
of the L2 -norm that measures the distance between the true unknown function 0 (x)
and the modeled regression function (x, ).
If the model is well specified, then this distance is precisely minimized at the true
parameter 0 . In particular, 0 is the only value for which, the distance is exactly
zero,
Z
Z
2
2
0 (x) (x, 0 ) dP0 (x) =
(x, 0 ) (x, 0 ) dP0 (x) = 0.
If the model is mis-specified, then 0 is, by definition, the unique element of that
minimizes the L2 -norm distance between the true function 0 and the model function (, ). The uniqueness of 0 can be easily obtained in several settings.2 Figure
1
The symbol should be read as proportional to. For example, f (z) g(z) means that f (z) is
proportional to g(z); i.e. there exists some constant c such that f (z) = g(z) + c z. Two functions
that are proportional have the same arg max.
2
For example, if 0 is continuous and (, ) is a polynomial function, then there exists a unique
0 that minimizes this distance. Many other results of this type exist.
1 shows examples of linear AR(1) approximations to nonlinear data generating processes. The figure on the left shows the best approximation that a mis-specified linear
AR(1) model can provide to a logistic AR data generating process with Gaussian innovations. The figure in the middle shows the best L2 approximation that a linear
AR(1) model can provide to an Logistic SESTAR data generating process. Finally,
the figure on the right shows the best L2 approximation that a linear AR(1) model
can provide to an Exponential SESTAR data generating process.
DGP: Logistic AR
Model: Linear AR
Model: Linear AR
Model: Linear AR
1
2
0.4
0.2
2.5
xt
0.6
xt
xt
0.8
1
0
2
1.5
0
0.2
10
xt 1
1
3
10
True function
Best L2 approximation
xt 1
True function
Best L2 approximation
xt 1
True function
Best L2 approximation
Figure 1: Left: best linear approximation to logistic function in L2 norm. Center: best linear
approximation to logistic SESTAR function in L2 norm. Right: best linear approximation to
exponential SESTAR function in L2 norm.
7.2
where log f (xt , xt1 , ) denote the log conditional densities of xt given xt1 . Under
appropriate regularity conditions, we know that the limit criterion is then given by
L () = E log f (xt , xt1 , ).
As a result 0 is, by definition, the unique maximizer of L
0 arg max E log f (xt , xt1 , ).
Interestingly enough, this means that 0 is the unique minimizer of the following
quantity
0 arg min E log f0 (xt , xt1 ) E log f (xt , xt1 , ).
where f0 (xt , xt1 ) is the true unknown conditional density of xt given xt1 . This
quantity is quite important: it is the Kullback-Leibler (KL) distance between the
conditional density f (xt , xt1 , ) implied by the model, and the true conditional density f0 (xt , xt1 ). This is crucial, because it shows that the point 0 that maximizes
the limit likelihood function Q , is also the point that provides the best approximation to the true conditional density f0 (xt , xt1 ) as judged by the Kullback-Leibler
distance. In other words, 0 satisfies
0 = arg min KL f0 (xt , xt1 ) , f (xt , xt1 , )
Figure 2: Left: best approximation to true density of DGP is unique. Right: best approximation
to true density of DGP is not unique.
7.3
Model selection
Until now we have worked with a wide range of linear and nonlinear models with
different dynamic properties and capable of describing different features of the data.
How do we know which model is best for describing a certain time-series? The answer
to this question depends crucially on what is the purpose of the model. There exists
no global satisfactory answer!
If a given model is designed to forecast, then it should be judged on its ability to
produce accurate forecasts. If a model is designed to describe well certain moments,
then it should be judged on its ability to approximate those moments well. If it is
designed to explain a certain dynamic behavior of a time-series, then the model should
be considered good if it is indeed capable of delivering the desired result. Having said
this, it is important to note that a model that approximates well the true distribution
of the data is also a model that, in general, will (i) produce accurate forecasts; (ii)
describe well the moments of the data; (iii) approximate well the dynamic features of
the data; etc. As such, all of the above objectives are, in one way or another, related
to approximating the true probability measure P0 . Below, we review a fundamental
and very general theory of model selection that builds on the theory of extremum
estimation covered in Chapters 5 and 6. This general method of model selection
attempts to find the model that best approximates the data by means of optimizing
a penalized estimation criterion function.
Looking back, it should be clear that Section 7.2 already suggested a method of
model selection. Namely, if the parameters of two competing models are estimated
by maximum likelihood, then it is reasonable to select the model that achieves the
highest log likelihood value. Indeed, by construction, this model provides the best
approximation to the DGP in KL distance. If the parameters of two competing
models are estimated by the least-squares method, then we should select the model
that achieves the lowest sum-of-squared residuals since this model provides the best
approximation to the DGP in L2 distance. There is however one detail that must be
7
T ) in order to account
adds a penalty to the negative of log likelihood value LT (xT ,
for the fact that the likelihood can always be improved by increasing the number of
parameters in the model. Note that the best model is the one with smallest AIC !
The AIC, introduced by H. Akaike in 1973 and 1974, gives rise to a truly general
model selection technique. Unfortunately, the theoretical foundations of the AIC are
poorly understood by the majority of practitioners. Misguided by simulation results
that do not reflect the theoretical context of each model selection technique, many
practitioners unfortunately abandon the AIC in favor of other criteria that are valid
under much more restrictive settings. Unlike a host of other criteria, the AIC can
be used to compare non-nested, non-congruent, misspecified models in very general
settings. In order to achieve great asymptotic generality, the AIC penalty should
however be allowed to grow with sample size. Following Sin and White (1996), the
following modified AIC can be used to consistently select models in large samples,
under very general conditions.
Definition 3 (Modified Information Criterion) Given two models (Model 1 and Model
2), with parameters 1 1 Rp and 2 2 Rq , p q, and with loglikelihoods
1 ) (Model 1), and L2 (xT ,
2 ) (Model 2), the modified AIC is given by
L1T (xT ,
T
T
T
1
1 ) L2 (xT ,
2 ) c(p q) T log(log(T )) 2
MAIC = L1T (xT ,
T
T
T
where c is a strictly positive scalar.
Naturally, a positive MAIC constitutes evidence in favor of model 1. A negative
MAIC constitutes evidence in favor of model 2. It is worth noting that the penalty
1
of the MAIC rises with sample size at a rate faster than T 2 , but slower that T . It
is also important to highlight that the MAIC is asymptotically consistent; i.e. the
MAIC selects the best model (in KL divergence) with probability converging to one
as T . The only additional complication of the MAIC compared to the AIC
is that it depends on the unspecified constant c > 0. Values of c 0.1 may be
acceptable as a rule-of-thumb. In particular, for c 0.1, we obtain a penalty c(p
21
q) log(log(T ))
(p q) for T 250. As a result, we obtain the AIC selection
rule precisely for sample sizes where the AIC performs reasonably well. In general
however, the selection of c > 0 should be guided, in any given setting, by further
theoretical or simulation based results.
7.4
This final section of this chapter is devoted to the estimation of the asymptotic
variance of extremum estimators. In particular, we end with a warning: econometric
d
T 0
T
N 0 , > as T .
T has an approximate
We also noted in Chapter 6 that this implies that the estimator
distribution given by
T approx
N 0 , > /T .
This approximate distribution can be used to conduct inference. However, in order
for this result to be useful in practice, we must estimate the unknown and .
Estimates of and are generally easy to obtain. Theorem 3 tells us that is the
asymptotic variance of the standardized derivative of the criterion function
T
1X
d
T
q(xt , xt1 , 0 ) N (0, )
T t=2
as T .
The central limit theorem for SE martingale difference sequences in Chapter 4, tells
us that if {q(xt , xt1 , 0 )} is a martingale difference sequence, then is simply the
variance of q(xt , xt1 , 0 ). Recall that being a martingale difference sequence means
essentially that {q(xt , xt1 , 0 )} is white noise, i.e. that it is uncorrelated with mean
zero and some finite variance. As such, for any given null hypothesis H0 : 0 = 0 ,
we can estimate
= Var q(xt , xt1 , 0 ) = Eq(xt , xt1 , 0 )2
by its sample counterpart
T
X
T = 1
q(xt , xt1 , 0 )2 .
T t=2
Luckily, when the model is correctly specified, then {q(xt , xt1 , 0 )} is always uncorrelated under the null hypothesis, and hence, estimation offers no problems. Furthermore, under correct specification, it holds true that = 1 , and hence, we can
estimate the asymptotic distribution of the estimator using
1
T N ,
/T
under H0 : 0 = 0 .
0
T
1
Since = E2 q(xt , xt1 , 0 ) , then for any given null hypothesis H0 : 0 = 0 ,
we can also estimate the asymptotic variance using the alternative estimator
T =
1 =
T
1 X
t=2
10
1
q(xt , xt1 , 0 )
Unfortunately, if the model is mis-specified, then {q(xt , xt1 , 0 )} is generally correlated and the equality = 1 does not hold. As a result, we must use a ro T for that takes into account the autocorrelation in
bust variance estimator
T above.
{q(xt , xt1 , 0 )}. Furthermore, we must separately estimate using
T given by
This yields an estimate of the asymptotic distribution of
N 0 ,
1 >
T T T
T
Remark: Software packages typically give estimates of the asymptotic variance that
are based on the assumption of correct specification! They do not use robust variance
T and they assume the equality = 1 . As a serious econometriestimators for
cian, you surely recognize the great limitations of this approach!
7.5
Exercises
2. Let {xt }tZ be a strictly stationary and ergodic time-series satisfying E|xt |4 <
given by
xt = 0 (xt1 ) + t t Z.
Suppose that you estimate the parameters 0 of the following model
xt = g(xt1 , )xt1 + t
t Z.
T
X
t=2
Suppose that there exists a unique parameter vector 0 that maximizes the
limit criterion function. Show that 0 minimizes an Lp norm distance between
0 (xt1 ) and g(xt1 , )xt1 .
11
3. The following table shows the estimation results for a sequence of nested models
estimated by the least squares method. Model A nests model B, model B nests
model C, and model C nests model D. Find 2 mistakes in this table.
Model
nr of parameters
R2
Adjusted R2
A
B
C
D
7
5
3
2
0.94
0.77
0.63
0.65
0.88
0.79
0.54
0.41
4. The following table shows the ML estimation results for four alternative (nonnested) models. Which model would you select?
Model
nr of parameters
Log likelihood
AIC
A
B
C
D
4
4
5
9
-1285.3
-1283.1
-1278.7
-1279.4
2578.6
2574.2
2567.6
2573.8
5. Answer again the question above with the additional information that the sample size is T = 100. What if T = 250? T = 500? T = 10000? as T ?
12
8.1
The probabilistic analysis of linear dynamic models is often simple and analytically
tractable. Consider for example the linear AR(1) model
xt = + xt1 + t t Z ,
{t } N ID(0, 2 ).
{t } N ID(0, 0.752).
Given that the last observed value of the quarterly gdp growth rate was xT = 0.37%,
in the first quarter of 2014, what is the probability that the growth rate becomes positive in the next quarter?. Well, conditional on the postulated model and parameter
estimates, the probability that the economy leaves the recession in the second quarter
of 2014, is then given by P T (xT +1 > 0 | xT ) 0.69. Indeed, conditional on the model
and estimated parameters, we have
xT +1 |xT N (0.435 , 0.752)
hence the probability of observing a positive growth rate at time T + 1 is actually
quite reasonable! The unconditional probability of positive growth is easily obtained
as being P T (xt > 0) 0.87 since xt N (1.01, 0.913) for every t.
In nonlinear dynamic models, it may sometimes be difficult to derive analytically
these probabilities. Consider for example the NLAR model
{t } N ID(0, 2 ).
xt = f (xt1 , t , ) t Z ,
(1)
Despite the fact that the innovations are assumed to be iid Gaussian, the distribution
of xT +1 given xT may be difficult to ascertain due to the nonlinear function f . Luckily,
approximate distributions can be easily obtained through Monte Carlo simulations.
In particular, the probability P T (xT +1 > c | xT ) can be approximated by drawing
N innovations {iT +1 }N
xiT +1 }N
i=1 , obtaining N simulated values for {
i=1 using (1), all
conditional on the same observed xT , and finally calculating
P T (xT +1 > c | xT )
N
1 X i
I(
xT +1 > c)
N i=1
{t } N ID(0, 0.01).
In this specific example, we calculate the probability that P T (xT +1 > 0.4 | xT ), with
the number of simulations set to N = 10000, and where the last observed sample
value is XT = 0.2.
14
N = 10000;
x T = 0.2;
(Set value of XT )
eps T1 = 0.1randn(1,N);
(Generate N values of T +1 )
(Calculate N values of xT +1 )
(Calculate P (xT +1 > 0.4 | xT ))
N = 10000;
x T = 0.2;
(Set value of xT )
(Generate h N innovations)
eps = 0.1randn(h,N);
x(1,:) = x Tones(1,N);
for t=1:h
(Calculate N values of xT +3 recursively)
x(t+1,:) = tanh(0.9*x(t,:)+eps(t,:));
end
P08 = (1/N)*sum(x(h+1)>0.4);
In certain cases, the nonlinear nature of the dynamic equation does not complicate
the probabilistic analysis of the model. For example, in nonlinear dynamic models
with additive innovations,
xt = f (xt1 , ) + t
it is easy to calculate conditional probabilities of the type P T (xT +1 > c | xT ) as long
as the distribution of the innovations is known. For example, if the innovations in the
model above are iid Gaussian N (0, 2 ), then it follows immediately that xT +1 |xT
N (f (xT , ), 2 ). Note however that for multiple steps-ahead probabilistic statements,
Monte Carlo simulations are again required even for nonlinear models with additive
innovations.
A similar reasoning applies to time-varying parameter models. Consider, for example,
the observation-driven local-level model,
xt = t + t
15
t+1 = + (xt t ) + t .
Conditional on the model at hand, probabilistic statements about xT +1 given xT
can easily be made since T +1 is given when we condition on the observed sample
xT . Suppose again that the innovations are iid Gaussian N (0, 2 ). Then xT +1 |xT
N (T +1 , 2 ). Again, Monte Carlo simulations may be required for multiple stepsahead probabilistic statements, especially, when the updating equation for the timevarying parameter is nonlinear.
In finance, the Value-at-Risk (VaR) is a popular risk measure that is often derived
from the probabilistic analysis of volatility models. Specifically, for a given portfolio
and a pre-specified probability , the daily -VaR is the minimum amount the investor
stands to loose with probability over a period of one day. For example, if a portfolio
has a daily 10%-VaR of 1 million euros, then there is a 10% probability that the value
of the portfolio will fall by more than 1 million euros in one day.
The VaR is often also stated in terms of percentage loss. For example, if a portfolio
has a daily 5%-VaR of 17%, then there is a 5% probability that the value of the
portfolio will fall by more than 17% of its value in one day. Mathematically, given a
portfolio value pt at time t, and a random return xt = (pt pt1 )/pt1 on the portfolio,
the 5%-VaR in percentage loss is defined as the value c that satisfies
P (xt c) = 0.05.
Clearly, the VaR expressed in percentage loss can immediately be turned into the
VaR in monetary loss by multiplying c loss by the value of the stock at that time.
Please take a moment to notice that the true VaR is not known exactly since the true
distribution of the sequence {xt }tZ is unknown. Indeed, any statements involving the
probabilistic distribution of {xt }tZ are statements about the unknown. Probabilities
about {xt }tZ can only be estimated, and those estimates typically depend on the
model adopted by the researcher and parameter estimates obtained from the data.
Typically, the VaR is interpreted as if the model was correctly specified and the
parameter estimates corresponded to the true parameter. This is a practice that simplifies the presentation of the estimated VaR for a public that is not specialized in
econometrics. Luckily however, we do not have to assume correct specification or correct parameters. Model uncertainty and parameter uncertainty can be acknowledged!
We just have to recognize that our VaR estimates are effectively conditional on the
model and the estimated parameters. In other words, we just have to recognize that
the estimated VaR obtained from setting P T (xt c) = 0.05 is an approximation to
true VaR that sets P0 (xt c) = 0.05. Below we give a useful definition of estimated
VaR.
16
8.2
Forecasting
Model uncertainty is virtually always ignored when forecasting and producing confidence bounds. This is usually done by imposing axioms of correct specification.
Typically, researchers ignore model uncertainty simply because model uncertainty is
too difficult to integrate in the design of forecast bounds. In certain cases however,
model uncertainty is less problematic. Sometimes it can even be safely disregarded!
This occurs when the statistical model is sufficiently general to contain the data
generating process; see e.g. Grenander (1992) and Chen (2007) for a review of the
sieve estimation of semi-nonparametric models with an infinite dimensional parameter space with unbounded complexity (called infinite entropy). In any case, even in
simple parametric models, it is important to note that we do not have assume correct
specification in order to set aside the complicated issue of model uncertainty. Instead,
we just have to recognize that our forecasts are conditional on the model at hand!
Parameter uncertainty is also rarely incorporated in producing forecasts and their
respective confidence bounds. This occurs because parameter uncertainty is also
difficult to incorporate in forecasting, at least analytically. There exist simulationbased methods that allow us to incorporate parameter uncertainty. These however,
lie outside the scope of this text.
In the end of the day, innovation uncertainty is typically the only ingredient that is
taken into account when producing point forecasts and deriving confidence bounds.
In essence, the researcher recognizes that future innovations are unknown and goes
about producing forecasts with confidence bounds that are effectively conditional on
the model and the estimated parameters. Luckily, taking innovation uncertainty into
account is often enough to produce reasonable confidence bounds. The reason for this
is that the distribution of out-of-sample forecast errors is typically approximated by
the distribution of in-sample residuals. Hence, poor models and poor parameter estimates that lead to large residuals, also lead to large out-of-sample forecast bounds. In
some sense, these bounds already incorporate some model uncertainty and parameter
uncertainty.
With these considerations in mind, we can formulate the following useful definition
of point forecast.
Definition 5 (Point forecast) Given a model P := {P (), }, a sample of data
T , the point forecast for xT +h , is
xT := (x1 , ..., xT ), and a parameter estimate
the conditional expectation xT +h = E T (xT +h | xT ).3
When forecasting continuous random variables, it should be immediately clear that
the probability of any point forecast xT +h being correct is exactly equal to zero. In
other words, xT +h satisfies xT +h 6= xT with probability one. In some sense, point
forecasts are meaningless if they are given without confidence bounds. Indeed, given
3
18
{t } N ID(0, 0.01).
(Set steps-ahead)
N = 10000;
x T = 0.2;
(Set value of xT )
(Generate N values of T +1 )
= x T*ones(1,N);
for t=1:h
x(t+1,:)
end
= tanh(0.9*x(t,:)+eps T1(t,:));
x hat(1) = x T;
upper bound(1)= x T;
lower bound(1)= x T;
for t=1:h
x hat(t+1) = mean(x(t+1,:));
(Calculate point forecasts (conditional mean)
upper bound(t+1) = prctile(x(t+1,:),95); (Calculate bound: 95th percentile)
lower bound(t+1) = prctile(x(t+1,:),05); (Calculate bound: 5th percentile)
end
plot(x hat,k)
hold on
plot(upper bound,r)
hold on
plot(lower bound,r)
{t } NIT() ,
t = (t1 , xt1 , ).
Given a sample of data xT and parameter estimates, we can certainly draw multiple
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
T
T+1
T+2
T+3
T+4
T+5
T+6
T+7
T+8
T+9
T+10
Figure 3: Point forecast and 90% confidence bounds produce by code snippet above.
simulated values {
iT +j }N,h
i=1,j=1 conditional on xT . With these simulated values we
can naturally calculate Monte Carlo approximations of the conditional mean and
T . Similarly, we also obtain
confidence bounds for T +h implied by the model under
the conditional mean and confidence bounds for xT +h .
This reasoning applies naturally to other time-varying parameter models. Consider the fat-tailed nonlinear volatility model
xt = t t ,
{t } NIT() ,
2
t2 = (t1
, xt1 , ).
8.3
Impulse response functions are instruments that allow us to study the dynamic behavior of time-series in response to a random unanticipated shock. In practice, they
allow us to analyze different what if scenarios. In macroeconometrics, we could ask
how many months does it take for aggregate consumption to recover from a negative
3% shock? In financial econometrics, one may be interested in knowing how does
volatility react to a negative return of -10%? In economics, unanticipated shocks can
come from a number of sources. From foreign demand shocks, to oils price shocks,
natural catastrophes, exchange rate shocks, etc. In a policy analysis context, one may
be interested in studying the effect of unannounced government expenditure shocks,
tax changes, money supply shocks, interest rates shifts, etc. Of course, it is important
to keep in mind that the parameters that describe the dynamics of the process may
22
be affected by policy changes. Indeed, the famous Lucas critique applies not only to
reduced form statistical models, but also, to the so-called structural models.4
Remark 1 (Lucas critique) Parameters estimated from historical data reflect, among
other things, the policies of the past. As a result, these parameters are not appropriate
to describe the dynamics properties of a time-series after a policy change. Different
institutional policies may give rise to different dynamics, and hence, different parameters. This must be recognized when performing policy analysis.
As before, we proceed carefully by recognizing that whatever analysis we make is conditional on the adopted model and estimated parameters. Our focus on the conditionality on the model and the estimated parameters may seem unnecessarily repetitive
to you. This could not be further from the truth! Understanding and recognizing
the limitations of the tools at our disposal is crucial for a competent and professional
econometric analysis of the data.
The Impulse Response Function (IRF) is essentially the expected path of {xt } after a
shock of a certain size at time t = s. Indeed, you may recall from your introductory
econometrics courses the following definition of IRF.
Definition 7 (Impulse Response Function) Given a model P := {P , } and
T , the Impulse Response Function (IRF) with origin x,
a parameter estimate
generated by a shock (or impulse) , at time t = s, is a sequence of points {
xt }
satisfying:
xt = x t < s,
xt = x + at t = s,
xt = E T (xt |
xs , xs1 , ...) t > s/
Regardless of the model being linear or nonlinear, the IRF describes the expected
path of {xt } following a shock of magnitude at time t = s starting from a fixed level
x. The figure below plots an IRF with origin x that coincides with the unconditional
mean of the process.
In introductory econometrics courses you have derived the IRFs of linear dynamic
models. As you may remember, these IRFs are often easy to derive by hand. Consider,
for example, the linear AR(1) model,
xt = 1 xt1 + t .
4
The later class of models is typically affected because those models are called structural, but
they are not truly structural.
23
x+
x
Before Impulse
After Impulse
{t } N ID(0, 0.01).
The snippet of MATLAB code below produces point forecasts of xT +h for h = 1, ..., 10,
with 90% confidence bounds. The number of simulations set to N = 10000, and the
last observed sample value is xT = 0.2.
s = 3;
x0 = 0.2;
e = -1;
h = 7;
N = 10000;
eps = 0.1*randn(s+h,N);
24
eps(1:s-1,:)
eps(s,:)
= e;
x(1:s-1,:)
x(s,:)
= 0;
= x0*ones(s-1,N);
= e*ones(1,N);
for t=s+1:s+h
x(t,:) = tanh(0.9*x(t-1,:)+eps(t,:));
end
for t=1:s+h
x tilde(t) = mean(x(t,:));
upper bound(t) = prctile(x(t,:),95);
lower bound(t) = prctile(x(t,:),05);
end
plot(upper bound,r)
hold on
plot(lower bound,r)
hold on
plot(x tilde,k))
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
-1.2
s-2
s-1
s+1
s+2
s+3
s+4
s+5
s+6
s+7
IRF describe if depends on the selected model and estimated parameter? What does
the IRF really mean if the model is misspecified? If two models produce different
IRFs, which one is best? In the future, I hope you use all your knowledge of about
parameter estimation, model specification, and model comparison, to give careful and
well founded answers to these questions.
Good Luck!
26