Standardizing hides the innovation variance

The University of Western Australia
School of Mathematics & Statistics

STAT7450 (4S5)

AR(1). AR(p).	Here ρ1 = φ, so the MME of φ is simply�φ = r1, the lag-1 sample correlation coefficient. Here sample correlations are substituted into the Yule-Walker equations and solving

them for the AR coefficients gives their MME’s. For example, if p = 2 we have

giving estimates

�φ1 =r1(1 − r2)

�θ =−1 ±�2r1 1 − 4r2 .

But which root should be chosen?? You should choose |�θ| < 1 to ensure that the estimated 2≤ ρ1 ≤

we have φ = ρ2/ρ1, so�φ = r2/r1. Now substitute back into (1) with k = 1 and solve solve it for

Table 1 shows comparisons between chosen values of parameters of some low-order models, and the MME’s obtained from simulated series of intermediate length using the tabulated pa-rameter values. The method of moments has worked quite well for each of the AR models, but it failed for the MA(1) model, and while it produced estimates for the ARMA(1,1) model, they are not very close to the true values.

For MA(q), the relation γ0 = (1 + θ2�vA = (1 −�φ1r1 − · · · −�φprp)S2. 1+ · · · + θ2 q)vAgives the estimate

provided you have admissible estimates for all of the θ’s.�vA = S2/(1 +�θ2 1+ · · · + �θ2 q),

ϵt − µ = At +	p �

AR(1)
AR(1)
AR(1)
AR(2)
MA(1)
ARMA(1,1)

	�	�

-0.731
1.49 -0.75
* 0.683 0.202

59
119 119 119 120 99

Table 1: True parameters and their MME’s for some simulated models; * indicates a complex-valued estimate.

The estimating equation ∂S∗(⃗φ, µ)/∂µ = 0 gives

h−1

t=p+1�

p
�

φjϵt−j = h−1

p
�

φj(ϵp+1−j + · · · + ϵh−j) ≈ ϵ

p
�

φj.

equation ∂S∗(ϵ, ⃗φ)/∂φi = 0 is

ϵt − ϵ −

p
�

φj(ϵt−j − ϵ) = 0.

remember that r0 = 1. Now this linear system is just the empirical version of the Yule-Walker equations, so you can see that the exact LSE’s are approximated by the MME’s of the last section.

Exact LSE’s can be obtained using standard regression methods. To do this express the model as
p
φjϵt−j + At ϵt = µ0 +�

At = ϵt +	q �

and generate ‘observed’ values of the innovations At, as follows. Start by arbitrarily setting A0 = · · · = A1−q = 0 (or other convenient value); these are innovations at past (unobservable) times; note that 1 − q ≤ 0. Next iterate forward in time to generate

A1 = ϵ1,	A2 = ϵ2 + θ1A1,	A3 = ϵ3 + θ1A2 + θ2A1 = ϵ3 + θ1(ϵ2 + θ1ϵ1) + θ2ϵ1,

tackled using arbitrary starting values, for example⃗A∗ =⃗0 and ⃗σ∗ =⃗ϵ. Thus the minimization is done conditionally on the starting values. All this still involves the unobserved A1, . . . , Ah, but they are computed step by step using the defining difference equation as the minimization proceeds. I will return to this after looking at the third approach.

5.4. Maximum likelihood. You should know that this is a widely used method of estimation for parametric models, and which has optimal properties in many cases. Using this approach for time series models requires more assumptions about the innovations than previously, usually that they are independent with a normal distribution. But even then obtaining the likelihhood function of the ϵ’s may not be easy.

density of A2, . . . , Ah is	�−(2vA)−1		h �	a2 t	�
(2πvA)−(h−1)/2exp
The AR(1) model centred at µ is
ϵt − µ = At + φ(ϵt−1 − µ)		(t = 1, . . . , h),

and ϵ1 involves the unobservable ϵ0. Use (4) to compute the conditional joint law of ϵ2, . . . , ϵh given ϵ1; the Jacobian of the transformation represented by (4) is unity. Thus substitution into (3) gives the joint conditional density

	�−(2vA)−1		�

L(φ, µ, vA) = (2πvA)−h/2(1 − φ2)

2 exp [−S(φ, µ)/2vA] ,

�vA = S(�φ, �µ)/h. Observe that S(φ, µ) = S∗(φ, µ) + (1 − φ2)(ϵ1 − µ)2.

Often, but not always, the second term on the right-hand side is small in comparison to the first

Maximization leads to the same set of estimating equations as does conditional least-squares.

You can avoid arbitrarily choosing starting values by minimizing

S = S(⃗φ, ⃗θ, µ) :=

Example 4.1.	Write the stationary ARMA(1,1) model as	(5)
Example 4.1.	ϵt = φϵt−1 + θ0 + At − θAt−1.	(5)

Since it is stationary it should be equivalent to the backward analogue

These values of A’s and ϵ’s are used to minimize S, presumably to produce new trial values of the parameters. A typical computer algorithm computes e.g. 100 − max(p, q) backcasts and it issues warnings if they have not settled down.

5.5. Properties of estimators. As the series length h → ∞ least-squares and maximum likelihood estimators are consistent and asymptotically normal with variances shrinking in a classical manner and covariances which do not vanish in the limit. The following results hold for common low-order models.

4. If φ ≈ θ in an ARMA(1,1) model then the standard errors of the estimates will be large.

5. Estimates can be highly correlated even when h is large.

Thus MME’s should be avoided for models with MA components, though sometimes they are used to initiate the iterative computation of LS or ML estimates.

5.6. The ‘deaths’ series. Recall the U.S. accidental deaths data analysed in §1.6. Figure 5.1 shows the standardized residuals after fitting a quadratic plus seasonal means model. Note that conclusions about model identification for residuals are the same for standardized or raw residuals. The time scale has been chosen to mark off successive twelve month periods to emphasize the fact that there seems to be almost no residual seasonal variation. This conclusion is supported by the ACF plot which shows little evidence of monthly seasonal variation.

	t-ratio
�φ1 = −0.2547�φ2 = 0.5159�θ1 = −0.6800�θ2 = 0.2805

�vA = 52793
Table 2: Estimated parameters for ARMA(2,2).

cannot account for the data. The PACF plot suggests that an AR(1) with fairly small φ may be appropriate. On the other hand, there are some fairly large values at high lags, e.g.�φ77 = −0.25.�φ77 lies outside this interval, and then only just outside. Fitting an AR(1) gives�φ = 0.3966 with t-ratio 3.6. Exactly the same estimates will occur variance �vA = 53, 066. deciding whether this is a good outcome is to inspect simulations of the fitted model, comparing them with the given series to assess if they could be realizations of the same generating processes. Standardizing ‘hides’ the innovation variance. One possible way of

An ARMA(2,2) fit gives the estimates listed in Table 2. The t-ratios suggest setting φ1 = θ2 = 0. Note that �vA varies little with the chosen model. as d grows. The ACF plots become more difficult to interpret, though an ARIMA(0,2,1) could be Differencing the data gives series which oscillate very rapidly, and with increasing amplitude

reasonable. However, fitting MA models to the second differenced data gives sums of squares 50% larger than for the stationary models. Taking account of the fact that there is little indication of non-stationarity in Figure 1, we conclude that fitting an ARIMA model is less preferable to fitting a stationary model.