LEAST-SQUARES PREDICTION FORMULAS FOR TIME-SERIES

by Rick Martinelli, Haiku Laboratories June 2008 

Copyright ©, 2008

Updated Mar 2010

 

Introduction

Linear Model

Quadratic Model

Cubic Model

Higher Degree Models

 

INTRODUCTION

The purpose of this memo is to derive some least-squares formulas to be used to predict financial market values.  The problem addressed here may be stated as follows: Given n ordered pairs of numeric data

 

{(x(k), y(k)) | k = 1,…,n},

 

where n > 1, find an expression for the least-squares estimate y*(n+1) of y(n+1) as a linear combination of the previous n data values y(1), y(2),…, y(n).  Here the y(k) represent market data values and the x(k) represent time values increasing with k.  Since large amounts of market data are reported at regular time intervals, the formulas presented here are derived for equal-interval data, commonly known as time-series.  In this case, the usual least-squares formulas are much simplified by assuming x(k) = k. 

 

It is assumed that the data is composed of a “smooth” trend line that has been corrupted with measurement noise, and that short segments of the trend line can be well-modeled by a polynomial.  In what follows, explicit formulas for linear, quadratic and cubic polynomial models are derived for small values of n.  In the last section polynomial models of higher degree are considered and a general formula is derived for any degree p > 0, and data length n = p + 1.

 

LINEAR MODEL

For a fixed data length n, assume the simple linear model y(k) = Ax(k) + B for all k, where A and B are to be estimated from the data, and the prediction y*(n+1) = Ax(n+1) + B.  Unless other wise indicated, in what follows all sums are from k=1 to n.  With this notation the normal equations for this model are

 

.

Changing to matrix form and making the substitution x(k) = k for equal-interval data yields 

.

 

The solution to this equation is

 

 

 

 

Using the identities 

 

n(n+1)/2,  and   n(n+1)(2n+1)/6,

 

The solution becomes

.

 

 

Substituting A and B into the equation for y*(n+1) gives

 

 

This is the desired form of the estimate.  Substituting values for n>1 yields formulas for the class of linear least-squares predictors on n points.  (Note the formula fails for n=0,1.)  Results are summarized in Table 1, where coefficients are ordered from smallest to largest k; for example when n=4

 

y*(5) =  -1/2 y(1) + 0 y(2) + 1/2 y(3) + y(4).

 

 

n

Coefficients

2

-1, 2

3

-2/3, 1/3, 4/3

4

-1/2, 0, 1/2, 1

5

-2/5, -1/10, 1/5, 1/2, 4/5

6

-1/3, -2/15, 1/15, 4/15, 7/15, 2/3

7

-2/7, -1/7, 0, 1/7, 2/7, 3/7, 4/7

Table 1. Coefficients for linear least-squares predictors on n points.

 

 

QUADRATIC MODEL

In the linear case we used a polynomial model of degree p=1.  The quadratic case uses a degree p=2 polynomial,

 

y(k) = A x(k)2 + B x(k) + C,

 

where A, B and C are to be estimated from the data, and

 

y*(n+1) = A x(n+1)2 + B x(n+1) + C.

 

for n>2.  The normal equations for this model are

 

.

where all sums are 1,…,n.  Changing to matrix form and making the substitution x(k) = k yields 

 

 

To solve this equation we need the additional identities

 

  and  ,

 

Substituting, the matrix becomes

 

.

 

Solving for A, B and C, and substituting into y*(n+1) gives

 

 

This is the predictor equation in the quadratic case.  Substituting for n>2 and evaluating provides formulas for the class of quadratic least-squares predictors on n points.  (Note the formula fails for n=0,1,2.)  Results are summarized in Table 2 where coefficients are ordered as in Table 1.

 

n

Coefficients

3

1, -3, 3

4

3/4, -5/4, -3/4, 9/4

5

3/5, -3/5, -4/5, 0, 9/5

6

1/2, -3/10, -3/5, -2/5, 3/10, 3/2

7

3/7, -1/7, -3/7, -3/7, -1/7, 3/7, 9/7

8

3/8, -3/56, -17/56, -3/8, -15/56, 1/56, 27/56, 9/8

Table 2. Coefficients for quadratic least-squares predictors on n points.

 

CUBIC MODEL

The cubic case uses a degree p=3 polynomial:

 

y(k) = A x(k)3 + B x(k)2 + C x(k) + D,

 

where A, B, C and D are to be estimated from the data, and

 

y*(n+1) = A x(n+1)3 + B x(n+1)2 + C x(n+1) + D.

 

The normal equations for this model are

 

.

where all sums are 1,…,n.  As in the previous cases, these equations may be solved, this time with the help of the identities

 

and

 

The solution is

 

 

This is the predictor equation in the cubic case.  Substituting for n>3 and evaluating provides formulas for the class of cubic least-squares predictors on n points.  (Note the formula fails for n=0,1,2,3.)  Results are summarized in Table 3 where coefficients are ordered as in Table 1.

 

n

Coefficients

4

-1, 4, -6, 4

5

-4/5, 11/5, -4/5, -14/5, 16/5

6

-2/3, 4/3, 1/3, -4/3, -4/3, 8/3

Table 3. Coefficients for cubic least-squares predictors on n points.

 

HIGHER DEGREE  MODELS

From the Tables we see that, for each n, the set of coefficients sums to 1, as it must (to see why this must hold, apply the formulas to y(k) = constant).  Note also that, when n = p + 1 the formulas have integer coefficients.  To see why, suppose the y(k) all lie on a straight line.  Then their first differences y(k+1) – y(k) are all equal, and their second differences are all zero:

 

(y(k+2) – y(k+1)) – (y(k+1) – y(k)) = y(k+2) – 2y(k+1) + y(k) = 0.

 

Note that the last three coefficients 1,-2,1 are the binomial coefficients in (a – b)2.  Similarly, if the y(k) all lie on a parabola, then the third differences are all zero:

 

y(k+3) – 3y(k+2) + 3y(k+1) - y(k) = 0,

 

and the coefficients 1,-3,3,1 are the binomial coefficients in (a – b)3.  In general: any polynomial model of degree p > 0 on equally-spaced data points will have binomial coefficients in its least-squares prediction formula when n = p + 1.  We start with a theorem on consecutive values of a sampled polynomial.

 

THEOREM 1

Suppose the time-series y(k) is a noise-free sampling from a polynomial y(t) of degree p > 0, where , and assume the samples are taken at equal time intervals.  Then every p+2 consecutive samples satisfy

 

.

where k > p+1 and  is a binomial coefficient. 

 

PROOF

The time-series y(k) may be written

 

,

 

where the ai are real constants, and tk = kD, where D is the time interval and k = 0, ±1, ±2,….  By scaling the time axis if necessary, we may take D = 1, so that tk - j = tk – j, where .  Then using the Binomial Theorem we can write

 

.

Therefore

 

 

And, rearranging the order of the summations,

 

where

.

We need to show that h(m) = 0 for m = 0,1,…,p, which implies S=0.  For this we define the polynomial H of degree p+1 by

.

Note that H(1) = h(0).  Substituting x=1 shows that H(1) = 0 = h(0).  To find h(1) we take the derivative of H which is

 

 

Here H’(1) = h(1) and substituting x=1 again we have .  Similarly, the second derivative of H is

 

 

and substituting x=1 again we have , and so h(2) = 0.  By repeated differentiation of H(x) we find, for m = 3,…,p,

 

H(m)(1) = 0 = h(m) + c1h(m-1) + … + cm-1h(1)

 

where the cj are integers.  Hence, h(m) = 0, m = 0,1,…,p, and the theorem is proved. ‚ 

 

The equation S=0 in the statement of Theorem 1 can be used as a predictor for y(k).  Re-writing the equation we have

 

,

 

and when the values y(k-j) are known (or estimated), we may take y(k)* = y(k).  This is called a linear predictor of length p.

 

THEOREM 2

The linear predictor y(k)* in Theorem 1 is actually a least-squares predictor.

 

PROOF

Under the same assumptions as in Theorem 1, suppose y(k)LS is the least-squares predictor of y(k), k > p+1, where the model is a polynomial of degree p, and the prediction is based on the previous p+1 known measurements {y(k-1),…,y(k-p-1)}.  Because y(t) is a noise-free polynomial of degree p, the definition of least-squares requires y(k)LS = y(k), otherwise there could be an estimate with smaller error, for example, the one provided by y(k)*, which actually equals y(k) by theorem 1.  Hence we must have y(k)LS = y(k)*.  Since this holds for all k > p+1, this proves the theorem. ‚