Imputation of Missing Values for BL (P,0,P,P) Models with Normally Distributed Innovations
Poti Abaja Owili
Mathematics and Computer Science Department, Laikipia University, Nyahururu, Kenya
Email address:
To cite this article:
Poti Abaja Owili. Imputation of Missing Values for BL (P,0,P,P) Models with Normally Distributed Innovations.Science Journal of Applied Mathematics and Statistics.Vol.3, No. 6, 2015, pp. 234-242. doi: 10.11648/j.sjams.20150306.12
Abstract: This study derived estimates of missing values for bilinear time series models BL (p, 0, p, p) with normally distributed innovations by minimizing the h-steps-ahead dispersion error. For comparison purposes, missing value estimates based on artificial neural network (ANN) and exponential smoothing (EXP) techniques were also obtained. Simulated data was used in the study. 100 samples of size 500 each were generated for bilinear time series models BL (1, 0, 1, 1) using the R-statistical software. In each sample, artificial missing observations were created at data positions 48, 293 and 496 and estimated using these methods. The performance criteria used to ascertain the efficiency of these estimates were the mean absolute deviation (MAD) and mean squared error (MSE). The study found that optimal linear estimates were the most efficient estimates for estimating missing values for BL (p, 0, p, p). The study recommends OLE estimates for estimating missing values for bilinear time series data with normally distributed innovations.
Keywords: Optimal Linear Interpolation, Simulation, MSE, Innovations, ANN, Exponential Smoothing
1. Introduction
A time series is data recorded sequentially over a specified time period. There are cases where some observations that were supposed to be collected are not obtained and this result in missing values. Being unable to account for missing observation may result in a severe misrepresentation of the phenomenon under study. Further, it can cause havoc in the estimation and forecasting of linear and nonlinear time series as in [3]. This problem can be overcome through missing value imputation.
Imputation of missing values has been done for several linear time series models. For non-linear time series models, imputation has been done for ARMA models with stable errors as in [24]. For other nonlinear models, such as bilinear time series models, there is no evidence to show that imputation of missing values has been explicitly done. Therefore this study derived estimates of missing values for the bilinear time series models with normally distributed innovations. The missing values were derived using optimal linear interpolation techniques based on minimizing the h-steps-ahead dispersion error. Other techniques for estimating missing values that were used included the non-parametric methods of artificial neural network as in [4], [31] and exponential smoothing.
Interest in this study was also on the quality of the imputed values at the level of the individual, an issue that has received relatively little attention as in [5]. The basic idea of an imputation approach, in general, is to substitute a plausible value for a missing observation and to carry out the desired analysis on the completed data as in [22]. Here, imputation can be considered to be an estimation or interpolation technique.
The imputation of the missing value technique developed may be adopted by data analysts to improve on time series modeling.
2. Literature Review
Most of the real-life time series encountered in practice are neither Gaussian nor linear in nature and are adequately described by nonlinear models. One of the most important nonlinear models used in practice is the bilinear time series models. The nonlinearity of bilinear models can be approached in two ways. The first approach is to create a model that consist of a blend of non-Gaussian and nonlinearity which has been widely discussed as in [31] where he considers the existence of bilinear models with infinite variance innovations. The other approach is to introduce nonlinearity in the model but assume that the distribution of the innovation sequence is Gaussian as in [36]. Properties of these models have been extensively studied in the literature. This is the bilinear model of interest in the study.
2.1. Bilinear Models
A discrete time series process is said to be a bilinear time series model of order BL (p, q , P, Q) if it satisfies the difference equation
where q, and are constants while is a purely random process which is normally distributed and =1. For the bilinear time series model BL(p,o,p,p), we have
Bilinear time series models may have sudden burst of large negative and positive values that vary in form and amplitude depending on the model parameters and thus it may be plausible for modeling nonlinear processes as in [25]. A bilinear model is a member of the general class of nonlinear time series models called ‘State dependent models’ formed by adding the bilinear term to the conventional ARMA model as in [30].
It is a parsimonious and powerful nonlinear time series model. Researchers have achieved forecast improvement with simple nonlinear time series models. Reference [21] used a bilinear time series model to forecast Spanish monetary data and reported a near 10% improvement in one-step-ahead mean square forecast error over several ARMA alternatives. Reference [9] also reported a forecast improvement with bilinear models in forecasting stock prices. The statistical properties of such models have been analyzed in detail as in [10), [11], [25], and [17] etc., while an economic application is presented as in [14].
The first step in identification of bilinear time series model is to determine whether a given data is nonlinear or not. Once the data is found to be nonlinear, it is important to fit an appropriate time series model to the data. Reference [39] pointed out that the second order properties of BL (p, p, 0, 1) are similar to those of ARMA (p, 1) and hence it is necessary to study higher order cumulants to distinguish them from nonlinear models. The technique of identification of a given nonlinear model can be extended to more general bilinear models provided there are difference equations for higher order moments and cumulants as in [24].
For some super diagonal and diagonal bilinear time series, the third order moments do not vanish and the pattern of nonzero moments can be used to discriminate between the bilinear models and white noise and also between different bilinear models. Looking at the table of third order moments, one can easily distinguish bilinear models from pure ARMA or mixed ARMA models.
Third order moments may also be useful in detecting non-normality in the distribution of the innovation sequence. References [10] and [37] have shown that in most cases, second order autocorrelation will be zero for these models which makes it difficult to distinguish them from complete white noise.
Reference [24] showed that with a large bilinear coefficient, a bilinear model can have sudden large amplitude bursts and is suitable for some kind of seismological data such as earthquakes and underground nuclear explosions. The variant of the bilinear process is time dependent. This feature enables bilinear process to be used also for financial data as in [21]. Empirical studies have been done on estimating missing values for different time series data. Reference [26] used interpolation and mean imputation techniques to replace simulated missing values from annual hourly monitoring air pollution data.
Reference [29] developed alternative techniques suitable for a limited set of stable case with index α∈ (1, 2]. This was later extended to the ARMA stable process with index α∈ (0,2] as in [24]. He developed an algorithm applicable to general linear and nonlinear processes by using the state space formulation and applied it in the estimation of missing values.
2.2. Missing Value Imputation for Nonlinear Time Series Models
Reference [35] derived a recursive estimation procedure based on optimal estimating function and applied it to estimate model parameters to the case where there are missing observations as well as handle time-varying parameters for a given nonlinear multi-parameter model. More specifically, to estimate missing observations, [3] formulated a nonlinear time series model which encompasses several standard nonlinear models of time series as special cases. It also offers two methods for estimating missing observations based on prediction algorithm and fixed point smoothing algorithm as well as estimating functions equations. Recursive estimation of missing observations in an autoregressive conditional heteroscedasticity (ARCH) model and the estimation of missing observations in a linear time series model are shown as special cases. However, little was done on bilinear time series models.
Reference [28] investigated influence of missing values on the prediction of a stationary time series process by applying Kaman filter fixed point smoothing algorithm. He developed simple bounds for the prediction error variance and asymptotic behavior for short and long memory process.
The work on estimation of missing values has also been extended to vector time series. A classic example is the studies done as in [20] who worked on estimation of missing values in possibly partially non stationary vector time series. He extended the method as in [19] for estimating missing values and evaluating the corresponding function in scalar time series. The series is assumed to be generated by a possibly partially non-stationary and non-invertible vector autoregressive moving average process. No pattern of missing data is assumed. Future and past values are special cases of missing data that can be estimated in the same way. The method does not use the Kalman filter iterations and hence avoids initialization problems. The estimation of missing values is provided by the normal equations of an appropriate regression problem.
2.3. Nonparametric Methods for Estimating Missing Values
Nonparametric methods have also been proposed for missing data. Reference [26] considered kernel estimation of a multivariate density for data with incomplete observations. When the parameter of interest is the mean of a response variable which is subject to missingness, the kernel conditional mean estimator to impute the missing values is proposed as in [5]. Reference [13] studied the estimation of average treatment effects using non-parametrically estimated propensity scores. Time series smoothers estimate the level of a time series at time t as its conditional expectation given present, past and future observations, with the smoothed value depending on the estimated time series model as in [16].
Reference [25] derived a recursive form of the exponentially smoothed estimated for a nonlinear model with irregularly observed data and discussed its asymptotic properties. They made numerical comparison between the resulting estimates and other smoothed estimates. Reference [1] have used neural networks and genetic algorithms to approximate missing data in a database. A genetic algorithm is used to estimate the missing value by optimizing an objective function. Many approaches have been developed to recover missing values, such as k-nearest neighbor as in [38], Bayesian PCA (BPCA) as in [27], least square imputation as in [12], local least squares imputation (LLSimpute) as in [15] and least absolute deviation imputation (LADimpute) as in [5].
It can be seen from the literature that there are a variety of methods used for estimating missing values for different time series models. What is however lacking in the literature is an explicit method for estimating missing values for bilinear time series models. The study therefore purposed to address this gap. It estimated missing values for bilinear time series which have different probability distributions.
2.4. Estimation of Missing Values Using Linear Interpolation Method
Suppose we have one valuemissing out of a set of an arbitrarily large number of n possible observations generated from a time series process. Let the subspace be the allowable space of estimators of based on the observed values i.e., =sp where n, the sample size, is assumed large. The projection of onto (denoted) such that the dispersion error of the estimate (written disp) is a minimum would simply be a minimum dispersion linear interpolator. Direct computation of the projection onto is complicated since the subspaces =spand are not independent of each other. We thus consider evaluating the projection onto two disjoint subspaces of. To achieve this, we express as a direct sum of the subspaces and another subspace, say, such that. A possible subspace is, where is based on the values. The existence of the subspaces and is shown in the following lemma.
Suppose is a nondeterministic stationary process defined on the probability space . Then the subspaces and defined in the norm of the are such that.
Proof:
Suppose, then can be represented as
where . Clearly the two components on the right hand side of the equality are disjoint and independent and hence the result. The best linear estimator of can be evaluated as the projection onto the subspaces and such that disp) is minimized. i.e.,
.
But = where the coefficients’ are estimated such that the dispersion error is minimized. The resulting error of the estimate is evaluated as
Now squaring both sides and taking expectations, we obtain the dispersion error as
(1)
By minimizing the dispersion with respect to the coefficients the optimal linear estimate is
(2)
3. Methodology
Three methodologies were used in this study, each corresponding with the estimation method used. These methods included optimal linear interpolation, artificial neural network and exponential smoothing. However, the time series data used and performance measures applied were the same for all the methods.
For optimal linear interpolation, the estimates of the missing values for bilinear time series models with normal innovations were derived by minimizing the dispersion error. The estimates of missing values using non parametric methods of ANN and exponential smoothing were also obtained.
3.1. Data Collection
Data was obtained through simulation using computer codes written in R-software. These codes are shown in the Appendix. The time series were simulated from different simple bilinear models which have normal. The seed in the R program code was changed to obtain a new sample for the general bilinear model BL (1,0,1,1). For each program code, a set of 100 samples of size 500 were generated. These were to be used in the analysis.
3.2. Missing Data Points and Data Analysis
Three data points 48, 293 and 496 were selected at random from a sample of size 500. Observations at these points were removed to create a ‘missing value(s)’ at these points to be estimated. Data analysis was done using statistical and computer software which included Excel, TSM and R and Matlab7.
3.3. Performance MeasuresThe MAD (Mean Absolute Deviation) and MSE (Mean Squared Error) were used. These were obtained as follows
MAD=∑│e_{t}│/n (3)
and
MSE=∑│e_{t}│^{2}/n (4)
4. Results
4.1. Derivation of the Optimal Linear Estimates of Missing Values
Estimates of missing values for pure bilinear time series models whose innovations have a Gaussian innovation were derived by minimizing the h-steps-ahead dispersion error. Two assumptions were made. The first one was that that the series are stationary and thus their roots lie within the unit circle. Secondly, the higher powers (of orders greater than two or products of coefficients of orders greater than two) of the coefficients are approximately negligible.
4.2. Bilinear Time Series BL (1,0,1,1) with Normally Distributed Innovations
The stationary bilinear time series model BL(1,0,1,1) with normally distributed innovations order BL(1,0,1,1) is given by
(5)
where
Theorem 4.1
The best linear estimate for the bilinear time series model with normal errors, BL (1, 0, 1, 1), is given by
Proof
Performing recursive substitution of (5), the stationary BL (1,0,1,1) can be expresses as
The h-steps ahead forecast is given by
and the h-steps ahead forecast error is given by
(6)
Substituting (6) in (1) we have
(7)
Simplifying each of the terms on the rhs of (7), we obtain
=
Hence equation (8) becomes
(8)
Differentiating (8) with respect to the coefficient, we get
Therefore we have
The best linear estimate of that minimizes the error dispersion error of the estimate is thus given as
4.3. Estimating Missing Values for BL (p,0,p,p) with Normal Innovation
The bilinear time series model of BL (p,0,p,p) is given by
(9)
The missing value estimate is based on the following theorem 4.2.
Theorem 4.2
The best linear estimate for one missing value x_{m} for the general bilinear time series model BL (p, 0, p ,p) is given by
Proof
Through recursive substitution of (9), the stationary bilinear time series model BL (p, 0,p,p) is expressed as
The h-steps ahead forecast is given by
or
and the forecast is
The forecast error is
(10)
Substituting in (10) in (1), we obtain
(11)
The terms on the right hand side of (11) are simplified as follows
==
=
Third term
+
(12)
differentiating (12) with respect to and setting to zero we obtain
Solving for we get
The optimal linear estimate is therefore given by
4.4. Simulation Results
In this section, the results of the estimates obtained from the optimal linear estimate, artificial neural networks and exponential smoothing are presented. Simulation results are given in table1. These graphs are characterized by sharp outburst as clearly evident in BL (1,0,1,1).Sharp outburst is one of the characteristics of nonlinearity in bilinear models.
MISSING | MAD | MSE | ||||
POSITION | OLE | ANN | EXP | OLE | ANN | EXP |
48.000 | 0.793 | 1.135 | 0.982 | 1.043 | 2.621 | 1.542 |
293.000 | 0.760 | 0.870 | 0.812 | 0.906 | 1.603 | 1.079 |
496.000 | 0.803 | 0.863 | 0.933 | 0.976 | 1.215 | 1.369 |
Total | 2.356 | 2.869 | 2.726 | 2.925 | 5.439 | 3.990 |
Mean | 0.785 | 0.956 | 0.909 | 0.975 | 1.813 | 1.330 |
From table 1, it is clear that the OLE estimates of missing values were the most efficient (mean MAD=0.785 for the different missing data point positions. This was followed by EXP smoothing estimates (mean MAD=0.908818). The estimates based on ANN were the least efficient.
5. Conclusion
In this study we have derived the estimates of missing values for bilinear time series models BL(p,0,p,p) whose innovations are normally distributed by minimizing the dispersion error. The study found the optimal linear estimate of the missing value is a function of both the data observations before and after the missing value position. Further , the study also found that optimal linear estimates were the most efficient estimates for the bilinear model BL(1,0,1,1)with normally distributed errors. The study recommends that for bilinear time series data with normal innovations which have missing values, OLE estimates be used in estimating the missing values.
A more elaborate research should be done to compare the efficiency of several imputation methods such as K-NN, Kalman filter and estimating functions, genetic algorithms, besides the three used in this study.
References