Spline Regression in the Estimation of the Finite Population Total
Joseph Kipyegon Cheruiyot
Department of Computer and Statistics, Moi University, Eldoret, Kenya
Email address:
To cite this article:
Joseph Kipyegon Cheruiyot. Spline Regression in the Estimation of the Finite Population Total. Science Journal of Applied Mathematics and Statistics. Vol. 3, No. 5, 2015, pp. 214224. doi: 10.11648/j.sjams.20150305.11
Abstract: This study sought to estimate finite population total using Spline regression function. It compared the Spline regression with Sample Mean estimator, designbased and model  based estimators. To measure the performance of each estimator, the study considered average bias, the efficiency by use of the mean square error and the robustness using the rate change of efficiency. In this research, five populations were used. Three of them were simulated according to the following models: linear homoscedastic, quadratic homoscedastic and linear heteroscedastic and two natural populations. The performances of the five estimators were studied under the five populations. The sudy found that Sample Mean(SM), HorvitzThompson (HT) and Ratio (R) estimators are not robust while NadarayaWatson (NW) and Periodic Spline (PS) are robust when linearity and homoscedasticity of the population structure are violated.
Keywords: Homoscedasticity, Population, Sample, Spline Regression, Robustness, Smoothing, Estimator
1. Introduction
1.1. Introduction
There are two generally accepted options in studying the characteristics of finite population. The first option is a study in which every unit of the population is examined called a census. Use of a census to study a population is time consuming, expensive, often impossible and strangely enough, often inaccurate. The other option is to study the characteristic of a population by examining a part of it. The theory of survey sampling as developed during the past several decades provides us with various kinds of reasonable scientific tools for drawing samples and making valid inference about the population parameters of interest.
1.1.1. Census Versus Sampling Method
Although there are advantages with the census method, the cost, effort and the time required to conduct census may be enormous unless the population is very small. In such a case we resort to sampling that involves examination of a part of the population. Although a census operation gives a more reliable data, sampling is more appropriated when:
i. The cost of conducting census would be prohibitive.
ii. The population is large, such that it would be impossible to conduct a census.
iii. The study involves destruction of elementary units under study, such that it would be appropriate to conduct sample testing.
iv. Quick results are required, such that it would be appropriate to conduct sample survey rather than carrying out a complete count.
1.1.2. Basic Ideas of Sampling and Estimation
In the basic sampling setup, the population consists of a known finite number N of units – such as people or plots. With each unit is associated a value of a variable of interest, sometime referred to as the yvalue of that unit. The yvalue of each unit in the population is unknown quantity. However, the units in the population are identifiable and may be labeled with numbers 1, 2,. N. A sample of the units in the population is selected and observed. The data collected consist of the yvalue for each unit in the sample together with the unit’s label. The procedure by which the sample units is selected from the population is called the sampling design. With most of the well known sampling designs, the design is determined by assigning to each possible sample the probability p(s) of selecting that sample. For example, using the simple random sampling design, the units are selected with equal and independent probability p(s).
1.2. Estimation Approaches
To estimate finite population total () in survey, where
We need to have Yi ( i =1, 2, 3,., N) the survey variables and xi ( i = 1, 2, 3,., N) design variables ( Auxiliary variables). The following therefore is a list of approaches that are considered in this study in the estimation of finite population total.
1.2.1. Design  Based Approach
This is also known as classical approach. In this approach, the variables of interest of the target population are viewed as fixed quantities. Also the design introduces selection probabilities that determine the properties of estimators that are used to obtain expected values, variances, biases etc. The samples are generated by sampling design p(s) with the values, held fixed. The repetition of sample drawing procedure forms the basis of randomization framework. The approach assumes that models have no relevance to the inferential framework. In experimental design, randomization is employed to protect the experimenter against subjective biases. Scott and Smith (1975) extended results of Blackwell. According to Fisher, randomization was relevant before the data were collected but not in the analysis of data which is in agreement with most statisticians in the experimental sciences. Randomization is therefore an insurance against selection bias.
1.2.2. Model – Based (Prediction) Approach
From Royall (1976) the concept of the super population is introduced thus: "The finite population should itself be regarded as a random sample from some infinite population". Hence finite population is assumed to be generated as a random sample from a super population. Also noted that variable of interest are viewed as random variables and properties of estimators depend on the joint distribution of these random variables. A sample is selected from the finite population using a known sampling scheme. Then observations are made on the sample values and are then used to make predictions about the non sample values. In this case the model connects a variable of interest Y with a set of auxiliary variables X, Cox (1995). However, noted that the choice of a model and it’s robustness to misspecification is the major issue. Small deviation from a chosen model may lead to serious errors in an inference. Sometimes the models become mathematically complex while still not being suitably realistic (Thompson, 1992). For example, where model assumption of the variable being studied is that of independence it ignores the tendency in many population for nearby or related units to be correlated.
1.2.3. NonParametric Approach
The parametric method of estimation is used when it is assumed that the data is drawn or generated from one of the known parametric family of distributions. In many cases however, the experimenter does not know the form of the basic distribution and needs statistical techniques which are applicable regardless of the form of the distribution. These techniques are referred to as non parametric or distribution free methods. They apply to very wide families of distributions rather than only to families specified by a particular functional form. They do not require the various assumptions about the distribution of population from which the sample was obtained. The main idea behind this class of models is that the effect of an explanatory (design) variable and dependent variable of interest is not modeled as parametric, usually linear function but is kept flexible. The only assumption needed is that the effects of the explanatory variables are modeled as smooth i.e. differentiable functions. The functional shape is then to be estimated from the data by either using: Kernel based methods or Spline based methods.
Kernel Based Method.
The Kernel estimator is expressed in terms of a Kernel function which satisfies the condition;
(1)
Usually, but not always, K will be a symmetric probability density function, the normal density for instance. Therefore, according to Silverman (1986) the Kernel estimator of the density function with Kernel K is defined by,
(2)
where h is the bandwidth. It is clearly observed that the Kernel estimator is a sum of ‘bumps’ placed at the observations. Each individual bumps is created by and the estimate is a resultant hump obtained by adding them up.
Spline Based Method.
The name, "Spline function" was given by I.J Schoenberg (1946) to the piecewise polynomial function known as univariate polynomial Splines. This was because of their resemblance to the curves obtained by their draftsmen using a mechanical Spline –a thin flexible rod with a groove and a set of weights called "duck" used to position the rods at points through which it was derived to draw smooth interpolation curves passing through prescribed points. The basic idea dates back at least to Whittaker (1923). More resent papers on the subject include Wahba (1975), Smith (1979), and Silverman (1985) among others. For Kernel regression estimation a weighting scheme due to Nadaraya (1964) –Watson (1964) has been associated with random design, and a convolution type weighting scheme with fixed design based on mean square error; none of the estimators is uniformly optimal in either design. The multitude of non parametric regression estimators is an issue of considerable practical and theoretical importance. A wide class of estimators studied by Jennen Steinmetz and Gasser (1988) included fixed width Kernel estimators, smoothing spline and nearest –neighbor estimators as particular cases. No estimator is uniformly best in terms of integrated mean squared error, but the kernel estimator turns out to be the minimax optimal. Since non parametric methods are usually intended to be applicable to a broad variety of situations the minimax property is an important safeguard. Two definitions of Kernel weights enjoy particular popularity, the Nadaraya –Watson type (Nadaraya 1964, Watson 1964) and the convolution type estimator (Priestly and Chao1972, Gasser and Muller 1979). The NadarayaWatson method is intuitively motivated as an estimator of a conditional expectation which suggests a context where the independent variable is random. Hence this method seems suited for a situation of randomly selected design points, whose distribution is determined by the design density.
A spline function is a piecewise defined function with certain smoothness conditions. The most commonly used form is the cubic splines. There are two sorts of splines; ordinary splines and Bspline. The two spline function have the same general structure regarding the piecewise defined function such as
(3)
and the smoothing conditions. The difference is that the ordinary splines go through all the data points exactly where as B spline do not necessarily fit the data exactly. For ordinary splines, the curve has to go through all the points hence the equation has to be satisfied for all the points. The spline function has to yield the value for. The smoothing conditions too have to be fulfilled. Bspline are piecewise defined functions usually polynomial with the same smoothness conditions as ordinary spline. They are however not forced through the data points exactly, the function has simply to come close to the data points.
In estimation of finite population total, the challenge is to identify an estimator that is efficient when the population structure is not known. In this study, try to compare the spline regression with the known estimators of nonparametric (NadarayaWatson), Sample Mean estimator, Designbased HorvitzThompson estimator and Modelbased Ratio estimator. The challenge is to obtain an estimator which is robust to the violation of both linearity and homoscedasticity of the population structure.
2. Methodology
2.1. NonParametric Estimation of the Population Total Using Kernels
In this section, the NadarayaWatson Kernel estimator is considered. It is assumed that the auxiliary information is available for the entire population and the auxiliary variable X and the study variable Y are related in a more general way.
Consider the model
yi = m +I (4)
where m is the mean function and i a random error term. It is assumed that the functional form of m (xi) is unknown but assumed to be smooth and continuous.
Let wi(x), i = 1, 2, …, n be the weight function known as Kernel function. The Kernel is a continuous, bounded and symmetric function which integrates to one. That is
k(u)du = 1
By taking kh(u) = h1k to be the Kernel with band width h. The weight sequences for the Kernel smoothers as given by Nadaraya (1964)  Watson (1964) is
wi(x) = (5)
The Nadaraya Watson estimator of m(x) in (3.1) is
(x)= (6)
Substituting 3.2 in 3.3 we have
(7)
The shape of the Kernel weights is determined by K, where K is a symmetric probability density function that satisfies conditions in equations 1. One unique feature of the size of the bandwidth is that the smaller it is the more concentrated are the weights around x. However, the nonparametric regression based estimator T_{np} for the population total T is given by
(8)
where is the NadarayaWatson estimator give in (7). Hence by substituting (7) in (8) Nadaraya – Watson estimator of the population total becomes:
(9)
where represents the NadarayaWatson estimator of the population total.
2.2. Properties of NadarayaWatson Kernel Estimator of the Population Total
The NadarayaWatson Kernel regression estimator is given as in (8) and (7)
In order to find a standard measure of estimation error, the Mean Square error (MSE), the study looked at the conditional mean and variance of under the model .
and
So that
Thus
where X_{p} is the population vector of Xvalues.
But
where is the standard NadarayaWatson estimator of the density.
Hence
Since under the model, we have;
.
Next, we look at the conditional error variance;
Since under the model, we have
where .
2.3. Spline Regression Estimator of the Population Total
Wahba (1975) has shown that Kernel smoothing estimator is closely related to smoothing
Splines estimator when it is represented approximately as a linear function of the data values yi. Hence there exists a weight function F (z,xi) such that
=
where the function F(z,x) is defined as
F(z,x) = (10)
Hence we have
=
Substituting in
=
We get the smoothing spline estimator of the population Total as
(11)
where K(u) is defined as
K(u) = 0.5 exp(u  /1.41)sin((  u /1.41) + π/4)
and the function K(u) has the following properties;
(12)
We can see that the properties of the function K(.) above are similar to those given for the Kernel function but can take negative values as well. Hence the smoothing spline estimator corresponds approximately to a Kernel type estimator of order 4. Eubank (1988) has shown that if the function m(.) is assumed to be periodic then corresponds to a spline estimator with a fixed bandwidth parameter h and weights F(z,x) = hw(u/h) where h = and w(u) =
the estimator corresponding to the periodic spline is
where F(.) is as defined in (10),hence giving
(13)
Since n^{1}F(z,x) does not sum to one, we divide the weights by their sum and we denote the modified weights by F_{R}(z,x_{i}), then
FR(z,xi) = (14)
therefore the F_{m}(z,x_{i}) periodic spline estimator of the function m(x)is given by;
(15)
Substituting (15) in (6)
We have the Periodic Spline Estimator of the population total as
(16)
hence
Let ,
Then
Next, we consider the conditional error variance;
let hence .
3. Empirical Results and Discussion
To compare the performance of the five estimators, that is, Horvitz Thompson, the Ratio estimator, Sample Mean estimator, the Nadaraya  Watson Kernel estimator and periodic spline estimator as spline regression estimator so as to identify a robust estimators, the study simulated three populations based on the following models; linear Homoscedastic model, Quadratic Homoscedastic model and Linear Heteroscedastic model. Also the study used two real populations. The criteria for comparing these estimators are average bias, mean square error and the rate of change of efficiency as a measure of robustness.
3.1. The Choice of the Kernel and Bandwidth
This study used the Gaussian Kernel in Nadaraya  Watson estimator of the population total which is defined as
where . Assume that the Kernel function K satisfies the conditions given in equation 1. An optimal bandwidth for NadarayaWatson smoother was chosen within the interval where is the standard deviation of ( Silverman, 1986). Therefore, the bandwidth h used was chosen to be the centre point h= 7/8. The Kernel function used in the periodic spline is K (u) = 0.5 exp(u  /1.41)sin((  u /1.41) + π/4) (Wahba, 1975)
3.2. Description of the Study Population and Estimators
The artificial population was simulated in the following manner.
a) In artificial population I, 76 data points were generated according to the model;
where , and = 0.5.
b) In artificial population II, we again generated 76 data points according to the model
.
c) In artificial population III, once more 76 data points were generated according to the model
. Where, in b and c are the same as in population I
d) The Real population IV, was obtain from the Kenya National Bureau of Statistics (KNBS) for the population census done in Kenya in 2009. In this population, i considered the Auxiliary variable Xi to be the number of households in the ith District and study variable Yi the total population by District except for Nairobi province where Divisions are used instated of Districts, where i = 1,2,., 76. Our variable of interest Y is the population total.
e) Population V, this population has variable X describing shares a customer already possessed (Acquired) versus shares applied for (Booked) in a stock exchange brokerage farm, variable Y, both expressed in Kshs. Again i selected 76 data points in this population. The average bias and Mean Square Error of the population total were computed for each of the following five estimators: Sample Mean, HorvitzThompson, Ratio estimator, NadarayaWatson and periodic spline.
Below is a summary of the formulae used in computing their respective population total.
The following are scatter diagrams showing the distributions of the five populations mentioned above.
This population appears to be linear with heteroscedastic variance structure.
The acquired shares and booked shares in this population structure appear to be uncorrelated.
3.3. Description of the Computation Procedure
For each artificial population of size 76, samples of size n = 40 were generated by simple random sampling without replacement and 30 replicate samples were selected and estimates computed. Similarly, for the real population of size 76, samples of each size 40 were replicated 30 by SRSWOR and the estimators of the population total computed. For the case of HorvitzThompson, the sample units xi’s are selected with unequal probabilities. To select a sample with unequal probabilities with HorvitzThompson weights, we have, the probability of the unit i being included in the sample such that
where . Hence the estimate of the population total is obtained as. For each of the population, we compute the true population total .
Define as the population total estimator, where r = SM, R, HT, NW, and PS. Then where is population total estimate of the ith sample and rth estimator while R is the number of sample replicates.Hence the bias of each estimator of populations total were computed as Thus the average bias for each estimator for both the real and artificial population totals are
where k = 1, 2, 3, 4, 5.
We define the mean square error to be
whereis the unconditional variance of the estimator over the 30 replicates for the artificial and natural populations. Therefore, the Mean Square Error in the estimation of both the artificial and natural populations is given by:
The Relative Change in Efficiency (RCE) for each estimator was given by
Where j = 1,2,3,4.
3.4. Results and Interpretations
The results of this study are summarized in Tables 1to 5. On each population the performance of each estimator is analyzed using the average bias and mean square error. The average bias is an indication of the measure of how closed an estimator is from the true value, while the MSE is used to assess efficiency of an estimator. For example an estimator will be said to be more efficiency than another, if its MSE is comparably smaller i.e if MSE (T_{1}) < MSE (T_{2}), where T_{1} and T_{2} are estimators, then T_{1} is said to be more efficient than T_{2}.
Estimator  Formula 
Sample Mean(SM) 

HorvitzThompson(HT)  = 
Ratio(R) 

NadarayaWatson(NW)  = + 
PeriodicSpline(PS) 

SM  HT  Ratio  NW  PS  
Estimate  82.38245  82.38245  83.2557  82.17422  82.61311 
Bias  2.090872  2.09187  2.96412  1.88264  2.32153 
Var  16.06147  59.24723  49.725193  17.95833  19.84259 
MSE  20.43321  63.61897  58.511200  21.50266  25.23209 
Population Total 80.29158
In population I, i noted that from the low values of the bias that all the five estimators perform well under these conditions. However, NadarayaWatson has the least bias followed by SM, HorvitzThompson, Periodic spline and Ratio estimator in that order. Looking at MSE of this population, SM estimator has the lowest MSE, followed by NadarayaWatson, periodic spline, and Ratio. HT estimator has the highest MSE in this population. However, the values of the MSE of these estimators on this population are lowest as compared to those obtained in the other populations. This implies that these estimators have high efficiency in linear and homoscedastic population structure. Though the sample mean with the least MSE is the most efficient in this population.
SM  HT  Ratio  NW  PS  
Estimate  80.50299  88.61431  88.28095  82.24362  82.75191 
Bias  1.99378  10.1051  9.77174  3.73441  4.2427 
Var  308.0725  1871.043  2037.5316  35.25991  38.1297 
MSE  312.0477  1973.156  2133.018  49.20572  56.13020 
Population Total  78.50921 
In population II, i noted that SM has the least absolute bias followed by NadarayaWatson, periodic spline, HorvitzThompson, and lastly the Ratio estimator. Next, looking at MSE, the NadarayaWatson and periodic spline both have low MSE followed by SM, HT and lastly Ratio estimator. Here we note that the NadarayaWatson is the best estimator for a quadratic and homoscedastic population while Ratio estimator has the highest MSE thus making it the least efficient estimator for this population. This is true because the ratio estimator is based on the assumption of linearity which when violated the estimator as expected breaks down.
SM  HT  Ratio  NW  PS  
Estimate  74.74086  82.06381  83.4728  80.31023  82.88512 
Bias  2.82623  3.49672  4.905757  1.74314  4.318024 
Var  20.81065  1467.187  1189.2757  25.4385  28.73993 
MSE  28.79823  1479.414  1213.3417  28.47704  47.38526 
Population Total  78.56709 
In population III, noted that NadarayaWatson has the least absolute bias, followed by SM, HorvitzThompson, Periodic spline and lastly Ratio estimator. Considering the MSE, NadarayaWatson and SM have a low MSE followed by periodic spline, Ratio and the HT estimator in that order. Nadaraya –Watson and SM become the best estimators of this population which is linear and heteroscedastic population.
SM  HT  Ratio  NW  PS  
Estimate  29.69973  26.436061  30.33481  29.8734  31.40218 
Bias  1.31815  1.945519  1.953226  1.491818  3.020605 
Var  29.31531  640.95164  560.92100  109.86438  186.017 
MSE  31.05283  644.76673  564.7361  112.0899  195.1441 
Population Total 28.38158 
In population IV, noted that Sample Mean has the least bias followed by NadarayaWatson,
HT, Ratio and lastly periodic spline. Looking at MSE, SM has the least MSE, followed by NadarayaWatson, Periodic spline, Ratio and HT estimator. Thus, SM has proved to be the best estimator for this real population which appears to be linear and with heteroscedastic variance from the scatter diagram.
OLS  HT  Ratio  NW  PS  
Estimate  22.90294  14.37602  17.61762  21.3429  20.95047 
Bias  2.98857  5.53835  2.29675  1.42853  1.0361 
Var  666.5136  1419.2356  1681.9498  197.4472  102.8545 
MSE  675.4452  1449.9089  1687.2246  199.48789  103.9280 
Population  Total  19.91437 
This population appears to be neither linear nor homoscedastic from the scatter diagram Figure
5. In this population, Periodic spline has the least absolute bias, next is NadarayaWatson, Ratio, SM, and lastly HorvitzThompson estimator. As concerns the MSE, Periodic spline has the least MSE thus proving to be the best estimator for this population whose structure is not known. It is followed by NadarayaWatson, SM, HT and Ratio estimator.
SM  HT  Ratio  NW  PS  
POP I  20.43321  63.61897  58.5112  21.50266  25.23209 
POP II  312.0477  1973.156  2133.018  49.20572  56.13020 
POP III  28.79823  1479.414  1213.3417  28.47704  47.38526 
POP IV  31.052832  644.76673  564.7361  112.0899  195.1441 
POP V  675.4452  1449.9089  1687.2246  199.48789  103.9280 
SM  HT  Ratio  NW  PS  
RCE I  14.271595  30.015214  35.454867  1.288355  1.224556 
RCE II  0.40938355  22.254290  19.7369136  0.3243495  0.877976 
RCE III  0.51972363  9.134818  8.6517607  4.2128388  5.2143488 
RCE IV  32.056245  21.7905  27.835925  8.2773587  3.1188819 
Finally, the study compared the relative Change in Efficiency (RCE) among the five estimators. First, was the case when linearity assumption of the population structure is violated. Considering the RCE I, in Table 7 that the nonparametric estimators, NadarayaWatson and Periodic Spline have low RCE. This imply that they are the least sensitive to the violation of the linearity structure of the population and hence the most Robust among the five estimators. They are then followed by the SM, and Ratio estimators. Nevertheless, HorvitzThompson estimator is the least Robust among them as far as the violation of linearity assumption of the population structure is concerned.Secondly RCE II, investigate the violation of the Homoscedastic assumption in a population structure. Considering the RCE II, Table 7 that the NadarayaWatson, Periodic Spline and SM have the lowest RCE. This imply that they are the least sensitive to the change of structure of the population and hence the most robust among the five when homoscedastic assumption is violated.
On the other hand the Ratio and HorvitzThompson are least robust to the violation of homoscedastic condition on the population structure. Next we consider RCE 111. SM is having the least value. Next on the list is NadarayaWatson, Periodic spline, Ratio and HorvitzThompson estimators. However, we have also noted that all values of RCE 111 are quite low. This implies that though SM is the most robust to the change in the population structure, the low value shows that the other estimators are also robust and we conclude that population I is almost similar in structure to population IV, though it seems that homoscedastic condition is violated.
Lastly, in RCE IV, The Periodic Spline estimator has the least value of RCE thus becoming the most robust estimator to the change of population structure from linear and homoscedastic to the structure which is non linear and non homoscedastic. NadarayaWatson also proved to be robust to the same change in the structure. However, Ratio and HorvitzThompson estimators proved to be highly sensitive to the changes in the population structure. These two estimators are therefore less robust as compared to the other two non parametric estimators. The least robust estimator on this list as fur as this population is concern is SM. Therefore, Periodic spline has proved to be robust when both linearity and homoscedastic conditions are violated.
4. Conclusions and Recommendations
4.1. Conclusions
This study has revealed that the spline regression estimator performed impressively well in all aspects considered: bias, efficiency and robustness. We noted that it performed well in linear homoscedastic model and in quadratic homoscedastic model. However, even when the homoscedasticity assumption was violated it still performed well. We therefore conclude that Periodic Spline estimator is a robust estimator. It is therefore recommended to be used as a suitable estimator of the population total when the structure of the population is unknown. It has also been noted that the NadarayaWatson estimator performs well in the linear homoscedastic model and also when the linearity conditions is violated. It also suffices to mention that its performance was unquestionably impressive in the linear heteroscedastic model clearly indicating that it is robust to the violation of linearity and homoscedastic condition.
4.2. Recommendation
i. From the findings of our research, the HorvitzThompson (designbased) estimator and the Ratio estimator (modelbased) should be used within the confines of a linear homoscedastic model. They are not appropriate for use when the structure of the population is not known.
ii. The two estimators, NadarayaWatson and periodic spline estimators; are suitable for use in linear homoscedastic model and even when the assumptions of the model are violated sensitivity to the change of population structure is relatively low and hence are classified as highly robust.
Notation
1. N = size of the finite population generally assumed to be known.
2. n = sample size.
3. x = design variable. Its values can either be made available before hand or in the course of data collection.
4. y = the survey variable or variable under study.
5. = n1 = sample mean.
6. s2 = sample variance =
7. s 2(sigma) = population variance =
8. = the finite population mean.
9. = Finite population total.
10. Srswor – Abbreviation of simple random sampling without replacement.
11. Ksh – Kenya shilling.
12. SM=Sample Mean
13. HT = HorvitzThompson
14. R = Ratio
15. NW = NadarayaWatson
16. PS = PeriodicSpline
Acknowledgements
First and foremost, I thank Almighty God for His abundant grace, faithfulness, care, knowledge and the strength throughout this period of study. Secondly, I sincerely thank my supervisor, Dr. Njenga, for his profession guidance, patience, his availability for consultation and provision of the reference materials that I needed in the course of this research project. My gratitude also goes to the other team of my lecturers in the Department of Mathematics (statistics), Kenyatta University for their support and very educative lectures. Special thanks also go to J.P. Wanjala of KNBS who closely journey with us under institution based program for his encouragement. Lastly, I am grateful to Mr. Felix Wasike of Kenyatta University for his invaluable technical assistance.
References