LEAST-SQUARES PREDICTION
FORMULAS FOR TIME-SERIES
by Rick Martinelli, Haiku
Laboratories June 2008
Copyright ©, 2008
Updated
Mar 2010
The
purpose of this memo is to derive some least-squares formulas to
be used to predict financial market values.
The problem addressed here may be stated as follows: Given n ordered
pairs of numeric data
{(x(k),
y(k)) | k = 1,
,n},
where
n > 1, find an expression for the least-squares estimate y*(n+1) of y(n+1) as
a linear combination of the previous n data values y(1), y(2),
, y(n). Here the y(k)
represent market data values and the x(k) represent time values increasing with
k. Since large amounts of market data
are reported at regular time intervals, the formulas presented here are derived
for equal-interval data, commonly known as time-series. In this case, the usual least-squares
formulas are much simplified by assuming x(k) =
k.
It is
assumed that the data is composed of a smooth trend line that has been
corrupted with measurement noise, and that short segments of the trend line can
be well-modeled by a polynomial. In what
follows, explicit formulas for linear, quadratic and cubic polynomial models
are derived for small values of n. In
the last section polynomial models of higher degree are considered and a
general formula is derived for any degree p > 0, and data length n = p + 1.
For a
fixed data length n, assume the simple linear model y(k)
= Ax(k) + B for all k, where A and B are to be estimated from the data, and the
prediction y*(n+1) = Ax(n+1) + B. Unless
other wise indicated, in what follows all sums are from k=1 to n. With this notation the normal equations
for this model are
![]()
.
Changing to
matrix form and making the substitution x(k) = k for
equal-interval data yields
.
The solution
to this equation is

Using the
identities
n(n+1)/2, and
n(n+1)(2n+1)/6,
The
solution becomes
.
![]()
Substituting
A and B into the equation for y*(n+1) gives
![]()
This is
the desired form of the estimate.
Substituting values for n>1 yields formulas for the class of linear least-squares predictors on n points. (Note the formula fails for n=0,1.) Results are
summarized in Table 1, where coefficients are ordered from smallest to largest
k; for example when n=4
y*(5) = -1/2 y(1) + 0 y(2) + 1/2 y(3) + y(4).
|
n |
Coefficients |
|
2 |
-1, 2 |
|
3 |
-2/3, 1/3,
4/3 |
|
4 |
-1/2,
0, 1/2, 1 |
|
5 |
-2/5,
-1/10, 1/5, 1/2, 4/5 |
|
6 |
-1/3,
-2/15, 1/15, 4/15, 7/15, 2/3 |
|
7 |
-2/7,
-1/7, 0, 1/7, 2/7, 3/7, 4/7 |
Table 1. Coefficients
for linear least-squares predictors on n points.
In the
linear case we used a polynomial model of degree p=1. The quadratic case uses a degree p=2
polynomial,
y(k) = A x(k)2 + B x(k) +
C,
where
A, B and C are to be estimated from the data, and
y*(n+1) = A x(n+1)2 + B
x(n+1) + C.
for
n>2. The normal equations for this
model are
![]()
![]()
.
where
all sums are 1,
,n. Changing to matrix
form and making the substitution x(k) = k yields

To solve this
equation we need the additional identities
and
,
Substituting,
the matrix becomes
.
Solving
for A, B and C, and substituting into y*(n+1) gives
![]()
![]()
This is
the predictor equation in the quadratic case.
Substituting for n>2 and evaluating provides formulas for the class of quadratic least-squares predictors
on n points. (Note the formula fails
for n=0,1,2.)
Results are summarized in Table 2 where coefficients are ordered as in
Table 1.
|
n |
Coefficients |
|
3 |
1, -3,
3 |
|
4 |
3/4,
-5/4, -3/4, 9/4 |
|
5 |
3/5,
-3/5, -4/5, 0, 9/5 |
|
6 |
1/2,
-3/10, -3/5, -2/5, 3/10, 3/2 |
|
7 |
3/7,
-1/7, -3/7, -3/7, -1/7, 3/7, 9/7 |
|
8 |
3/8, -3/56,
-17/56, -3/8, -15/56, 1/56, 27/56, 9/8 |
Table 2. Coefficients
for quadratic least-squares predictors on n points.
The cubic
case uses a degree p=3 polynomial:
y(k) = A x(k)3 + B x(k)2
+ C x(k) + D,
where
A, B, C and D are to be estimated from the data, and
y*(n+1) = A x(n+1)3 + B
x(n+1)2 + C x(n+1) + D.
The
normal equations for this model are
![]()
![]()
![]()
.
where
all sums are 1,
,n. As in the previous
cases, these equations may be solved, this time with the help of the identities
![]()
and
![]()
The
solution is
![]()
This is the
predictor equation in the cubic case.
Substituting for n>3 and evaluating provides formulas for the class of cubic least-squares predictors on n
points. (Note the formula fails for
n=0,1,2,3.)
Results are summarized in Table 3 where coefficients are ordered as in
Table 1.
|
n |
Coefficients |
|
4 |
-1, 4,
-6, 4 |
|
5 |
-4/5,
11/5, -4/5, -14/5, 16/5 |
|
6 |
-2/3,
4/3, 1/3, -4/3, -4/3, 8/3 |
Table 3. Coefficients
for cubic least-squares predictors on n points.
From the
Tables we see that, for each n, the set of coefficients sums to 1, as it must
(to see why this must hold, apply the formulas to y(k)
= constant). Note also that, when n = p
+ 1 the formulas have integer coefficients.
To see why, suppose the y(k) all lie on a straight
line. Then their first differences y(k+1) y(k) are all equal, and their second differences
are all zero:
(y(k+2)
y(k+1)) (y(k+1) y(k)) = y(k+2) 2y(k+1) + y(k) =
0.
Note that
the last three coefficients 1,-2,1 are the binomial
coefficients in (a b)2.
Similarly, if the y(k) all lie on a parabola,
then the third differences are all zero:
y(k+3) 3y(k+2) + 3y(k+1) - y(k) =
0,
and
the coefficients 1,-3,3,1 are the binomial coefficients in (a b)3. In general: any polynomial model of degree p
> 0 on equally-spaced data points will have binomial coefficients in its
least-squares prediction formula when n = p + 1. We start with a theorem on consecutive values
of a sampled polynomial.
THEOREM 1
Suppose
the time-series y(k) is a noise-free sampling from a
polynomial y(t) of degree p > 0, where
, and assume the samples are taken at equal time
intervals. Then every p+2 consecutive
samples satisfy
.
where
k > p+1 and
is a binomial
coefficient.
PROOF
The
time-series y(k) may be written
,
where
the ai are real constants, and tk = kD, where D
is the time interval and k = 0, ±1, ±2,
. By scaling the time axis if necessary, we may
take D
= 1, so that tk - j = tk j, where
. Then using the Binomial Theorem
we can write
.
Therefore
![]()
And,
rearranging the order of the summations,
![]()
where
.
We need
to show that h(m) = 0 for m = 0,1,
,p, which implies
S=0. For this we define the polynomial H
of degree p+1 by
.
Note that H(1) = h(0).
Substituting x=1 shows that H(1) = 0 =
h(0). To find h(1)
we take the derivative of H which is
![]()
Here H(1) = h(1) and substituting x=1 again we have
. Similarly, the
second derivative of H is
![]()
and
substituting x=1 again we have
, and so h(2) = 0. By repeated
differentiation of H(x) we find, for m = 3,
,p,
H(m)(1) = 0 = h(m) + c1h(m-1)
+
+ cm-1h(1)
where
the cj are integers. Hence, h(m) = 0, m = 0,1,
,p, and the theorem is proved.
The
equation S=0 in the statement of Theorem 1 can be used as a predictor for y(k). Re-writing the
equation we have
,
and
when the values y(k-j) are known (or estimated), we may take y(k)* = y(k). This is called a linear predictor of length p.
THEOREM 2
The
linear predictor y(k)* in Theorem 1 is actually a
least-squares predictor.
PROOF
Under the
same assumptions as in Theorem 1, suppose y(k)LS is the
least-squares predictor of y(k), k > p+1, where the model is a polynomial of
degree p, and the prediction is based on the previous p+1 known measurements
{y(k-1),
,y(k-p-1)}. Because y(t) is a noise-free polynomial of degree p, the definition
of least-squares requires y(k)LS = y(k), otherwise there could be an
estimate with smaller error, for example, the one provided by y(k)*, which
actually equals y(k) by theorem 1. Hence
we must have y(k)LS = y(k)*. Since this holds for all k > p+1, this
proves the theorem.