HETEROSKEDASTICITY

Collins Aigbekaen Dwight
9 min readJan 3, 2022

INTRODUCTION

Violation of the homoscedasticity assumption of the OLS estimator leads to a heteroscedasticity problem.

At the end of this post, you will be able to understand the meaning, types, causes, and consequences of Heteroskedasticity as well as the corrective measures of the problem of Heteroskedasticity.

HETEROSKEDASTICITY

Heteroskedasticity means that the variance of the stochastic disturbance term (ui) is not constant (the same) for all values of the explanatory variables. This is because the variance of the stochastic disturbance is no longer given by a finite constant and thus would tend to change with an increasing range of values of the explanatory variables thereby making it impossible to be taken out of summation.

Thus, the homoskedastic variance-covariance matrix is given by :

However, given the presence of heteroskedasticity,

In effect, the heteroskedastic variance-covariance matrix is given by:

The subscript i denote the fact that the variances of each stochastic disturbance are all different.

The occurrence of heteroskedasticity is found in both time series and cross-section data but is more often encountered and severe with cross-section data. This is because the assumption of constant variance over the heterogeneous units may be rather unrealistic.

Causes of Heteroskedasticity

(a) Outliers Problem

Outlying observations are the root cause of heteroskedasticity. An outlier is an observation that is either excessively small or excessively large in relation to other observations in the sample. The table below illustrates a scenario of an outlier

In other words, the outlying observation exhibits a huge difference from others in the sample. In effect, the population of the outlying observation is different from the population of the other sample observations

(b) Omitted Variable Bias

The omission of key explanatory variables from a regression model causes heteroskedasticity. For example, consider the following model of consumption expenditure:

In line with economic theory, income is the most crucial determinant of consumption expenditures. Thus, if income is omitted from the model, the omitted variable bias would have been induced. This in turn attracts heteroscedasticity.

(c) Error Learning Factors/Models

As people learn every day from their past mistakes, their errors of behavior become smaller and smaller over time and as such cannot be relatively constant.

(d) Wrong Functional Form

Specification error or incorrect functional form of a regression model causes heteroskedasticity. This occurs when a model is being regressed with a pool of “level” and “log” variables at the same time

where C is consumption expenditure, r interest rate Yd is disposable income.

(e) Erroneous data transformation

Incorrect data transformation is another cause of heteroskedasticity. It occurs when a regression model is being regressed with a pool of ratio and first-difference set f data at the same time.

(f) Skewness

Skewness in the distribution of the explanatory variables causes heteroskedasticity. For example, the distribution of income and wealth is most often unequal but skewed in such a way that the bulk of income and wealth is owned by a few individuals at the top. Thus, while the spending behavior or the expenditure profile of a cross-section of families with low income may exhibit a similar pattern in addition to being relatively stable, such expenditure profile of the cross-section of the rich families with high income could be different and highly volatile

Consequences of Heteroskedasticity

To investigate the effects of heteroskedasticity on the OLS estimator, its variance, and standard errors, it becomes desirous that we revert to the matrix specification of the classical linear regression model [CLRM].

This portrays the fact that with heteroskedasticity,

(a) The variances and standard errors of the OLS estimator are no longer efficient, not even asymptotically. In other words, not even in large samples. Thus, the minimum variance property of the OLS estimator is lost.

(b) The variances and standard errors are overestimated by the OLS estimator thereby getting the standard errors of the estimated coefficients distorted by being over boosted.

© Statistical tests of significance is rendered invalid. As it were, the validity of the conventional formulae for t and f test statistics becomes impaired

(d) Statistical inferences are erroneous. Consequently, with heteroskedasticity, there is a higher risk of committing type 1 error which entails rejecting a correct null hypothesis instead of accepting it, and also there is the likelihood of committing a type II error which has to do with the acceptance of an incorrect null hypothesis rather than rejecting it.

(e) Confidence interval of the estimated coefficients becomes inordinately wide. In other words, confidence intervals are overly outsized

(f) In general, the hypothesis-testing procedures on the basis of the OLS estimates are contaminated and spurious.

(g) Heteroskedasticity does not destroy the unbiasedness property of the OLS estimator. As a matter of empirical fact, E{vi / X} =0 still holds. Consequently, the OLS estimator B remains unbiased.

Statistical Tests for Heteroskedasticity

There are numerous tests for detecting the presence or otherwise of the problem of heteroskedasticity. These include informal and formal techniques.

The formal methods for detecting the presence or otherwise of heteroskedasticity are methods that suggest that the econometrician has some apriori information set about the true pattern of heteroskedasticity. In effect, the econometrician’s task is to conduct the regression analysis on the assumed pattern of heteroskedasticity.

Glejser Test

Due to Glejser (1969), the Glejser test is a formal test for heteroskedasticity that regresses the absolute values of the estimated residualson various powers of the explanatory variable of the model. The test is based on the following hypothesis.

H0 : vi are homoskedastic

H1 :vi are heteroskedastic

Spearman’s Rank Correlation Test [SRCT]

The SRCT statistic is a detective measure of heteroskedasticity that ranks the values of the explanatory variable and the estimated regression residuals either in ascending or in descending order of magnitude without regard for the signs of the residuals. Given the following regression model to be estimated:

What follows next is to:

(a) Fit the regression to the data on Yi and X i and

(b) Generate the residuals u.

© Disregarding the sign of the estimated residual, that is taking only the absolute value of the estimated residual, we rank u and X i either in a descending or in ascending order.

(d) Next, is the computation of the Spearman rank correlation coefficient (SRCC). The test statistic is given as:

Where d is the difference between the values of corresponding pairs of X and, n is the number of observations in the sample, that is, the number of individual units being ranked. The test is based on the following hypothesis.

H0 : vi are homoskedastic

H1 : vi are heteroskedastic

Goldfeld-Quandt Test

This is a formal test for heteroskedasticity due to Goldfeld and Quandt (1972). By definition, the G-Q test is a fundamental F-test statistic that entails the ordering of the set of observations in accordance with the magnitude of the values of the explanatory variable and thereafter divides the set of observations into three parts such that the first and third halves are equal. The middle half which is made up of one-quarter of the total number of observations in the sample ( n/4)is excluded from the test.

Thus, if n=16, it implies that the 4 middle observations in the ordered set must be omitted or deleted and the balance 12 observations divided into two equal halves of 6 observations each. Having ascertained this division, separate error variances or residual variances are estimated from the OLS regression of the two equal halves of 6 observations each. Worthy of note is the fact that the ordering of the data set is determined on the basis of an ascending order of the values of the explanatory variable.

Corrective Measures for Heteroskedasticity

Transformation based on the Pattern of Heteroskedasticity

In econometric literature, different assumptions about the error term have been made and this warrants the type of data transformation in order to eliminate heteroskedasticity from an empirical model.

Logarithmic Transformation [LT]

Given the unknown nature of heteroskedasticity, a logarithmic transformation [estimating the original model in log] is also applicable in resolving the problem of heteroscedasticity. In this case, the transformation of the original model becomes:

Merits of Logarithmic Transformation

In general, logarithmic transformation helps to reduce if not total elimination of the problem of heteroscedasticity. This is evident in the following facts. Logarithmic transformation:

(a) Compresses the scales in which the variables are measured, thereby reducing a tenfold difference between two values a two-fold difference. For example, the number 120 is ten times larger than 12, but Ln(120) gives 4.787 which is just about twice as large as Ln(12) which is equal to 2.485

(b) Yields direct elasticities. For example, the slope coefficient of a logarithmic transformed model measures the elasticity of the dependent variable with respect to the regressor in question. In particular, it measures the percentage change in the dependent variable due to a percentage change in the explanatory variable.

Logarithmic Transformation Problems

(a) Log transformation is not applicable if some of the observations [data points] for both the dependent and the explanatory variables are zero or negative

(b) The problem of spurious correlation will be encountered between the ratios of the transformed variables even when the original variables are uncorrelated or random.

© The conventional F and t-tests for model robustness are only valid in large samples given that the variances are unknown and are estimated from any of the transformation procedures.

(d) In the multiple regression model [MRM], a model with more than one regressors, it is difficult to ascertain on an apriori basic which of the regressors to be chosen for data transformation

Example 1

Consider the data below:

Test for the presence or otherwise of heteroskedasticity using the spearman rank coefficient test statistics.

Solution

First: We state the hypothesis:

H0 : ei arehomoskedatsic

H1 : ei areheteroskedatsic

Secondly: We would estimate an SRM which is of the following specification

where d is the difference between the values of corresponding pairs of Xandeobservations, n is the number of observations in the sample. Solving ties: 1+2/2=1.5, 3+4/2 = 3.5

Decision Rule: We can evaluate the z and t critical values as follows:

Example 2

Given:

Test for heteroskedasticity using the Goldfeld-Quandt test statistic at both the 5% and 15 significance levels.

Solution

Hypothesis:

H0: residual variance is homoskedastic

H1 :residual variance is heteroskedatsic

Solving first half

The underlying model can be specified thus

--

--

Collins Aigbekaen Dwight

I share learning resources on Economics, Business, Research/Thesis, Internship, Career growth, Scholarship, and other learning opportunities