HOME | CURRENT | ARCHIVES | FORUM

Research World, Volume 5, 2008
Online Version


Report R5.10

Doing Research With Time-Series Data

Seminar Leader: Jaydeep Mukherjee, XIMB
jaydeep[at]ximb.ac.in

Part 1: Introduction to Econometrics

Econometrics literally means measurement in the field of economics. Economic theory suggests statements or hypotheses that are qualitative in nature. Econometrics is the application of statistical and mathematical methods to empirically test economic theory and suggest solutions to economic problems.

Despite the common assumption, econometrics is not statistics. It differs in one important aspect--the study of statistics is performed in controlled experiments, whereas econometrics often has to deal with observational data. In some sense, an econometrician is similar to an astronomer, who gathers data but cannot conduct experiments. Therefore econometricians often seek natural experiments in the absence of evidence from controlled experiments.

In the absence of experimental data, it is difficult for the researcher to pin down the exact cause or causes affecting a particular situation. It is not possible for a researcher to establish the exact relationship between economic variables and this is where the difference lies between a mathematical model and an econometric model. Unlike the deterministic nature of a mathematical model, an econometric model captures the inexact relationships between economic variables through the introduction of an error term, which is a random variable and has well defined probabilistic properties.

An important tool used in econometrics is regression analysis that is concerned with the study of the dependence of one variable, the dependent variable, on one or more other variables, the independent variables, with a view to estimating and/or predicting the population mean or average value of the former in terms of the known or fixed values of the later. The dependent variable is assumed to be statistical, random, or stochastic in nature while the independent variables are assumed to be fixed (non-random) in nature.

The general form of a linear regression model is:

(i = 1, 2, 3 . . . n)
n = number of observations
k = number of explanatory variables
, , are the parameters whose values are to be estimated
is the error term

The error term acts as a surrogate or proxy for all the omitted or neglected variables that may affect but are not included in the regression model. The linear regression model is linear in parameters and can take other functional forms such as log-linear model, semi-log model, logarithm reciprocal model, and so forth. The regression model can also be nonlinear with the dependent variable varying nonlinearly with the parameters.

The objective of regression analysis is to establish the relationship between the regressor(s) and regressand by estimating the best possible values of parameters. The two commonly used methods of estimation are ordinary least squares (OLS) method and maximum likelihood (ML) method. While developing the econometric model, certain assumptions are made to facilitate the process of parameter estimation and hypothesis testing. For example, in classical normal linear regression model (CNLRM), the error term is assumed to be normally, identically, and independently distributed with zero mean, constant variance, and absence of autocorrelation. Only then can one draw statistical inferences about the population parameters and conduct hypothesis testing. These assumptions may appear pedantic and often unrealistic in nature, however, these are necessary in building models.

A researcher is required to treat these assumptions with skepticism as these are only theoretical and may not always hold true. The quality of estimation is based on the assumptions and, if assumptions are not met, the results may not be trustworthy. For example when there is heteroskedasticity (unequal variances), the variance of the coefficients tends to be underestimated, inflating t-ratios and, sometimes, making insignificant variables appear statistically insignificant. There are several statistical tests and procedures available for the detection and removal of such errors arising out of wrong assumptions. A researcher needs to be aware of these tests and their conditions of applicability to be able to choose the appropriate ones.

A regression analysis involves three distinct stages: the specification of a model, the estimation of the parameters of this model, and the interpretation of these parameters. The first stage of model specification is the most critical of these stages. Some of the questions a researcher has to ask while specifying the model are:

(a) What variables should be included in the model?
(b) What is the functional form of the model? Is it linear in parameters or variables?
(c) What is the probabilistic assumption made about, , and ?

A model is wrongly specified by omitting important variables from the model, or by including irrelevant variables in the model, or by choosing the wrong functional form, or by making wrong stochastic assumptions about the variables of the model resulting in model specification error. The researcher may often has to apply own judgment while choosing the number of variables and the functional form of the econometric model. One has also to make some assumptions about the stochastic nature of the variables included in the model. While choosing an appropriate model, a great deal of skill and experience is required on the part of a researcher.

Part 2: Time-Series Data

There are generally three types of data available for empirical analysis: time-series data, cross-section data, and panel data. Time-series data are a set of observations on the values that a variable takes at different time intervals such as daily, weekly, monthly, and so forth. Cross-section data are data on one or more variables collected at the same point of time. Panel data are two-dimensional data, which is a combination of time-series and cross-section data.

The two main objectives of time-series analysis are identifying the nature of the phenomenon represented by the sequence of observations, and forecasting. Both of the above goals require that the pattern of observed time-series data is identified and formally described. As in most other econometric analyses, time-series data exhibit a random noise which usually makes the pattern difficult to identify. Most time-series analysis techniques involve some form of filtering out noise in order to make the pattern more prominent.

Empirical work based on time series assumes that the underlying time series is stationary in nature. A time series is said to be stationary if its mean and variance are constant over time and the value of covariance between any two periods depends only on the time lag. It is a necessary assumption, as in the case of nonstationary process, the observations in the time series are time-dependent and it is not possible to generalise the pattern for the purpose of forecasting.

The classic example of a nonstationary stochastic process is the random walk model (RWM). Two types of random walk models are:

(a) Random walk without drift: ; where is a white noise error
(b) Random walk with drift:; where is known as the drift parameter

In the random walk model without drift, the means remain constant whereas the variance increases indefinitely. Similarly in the random walk model with drift the mean as well as variances increase indefinitely over time thus violating the condition of stationarity.

Many statistical tests are available to examine whether a time series is stationary or not. At an informal level, the plotting of time-series data or sample correlogram can give the clue of the likely nature of time series. At a more formal level, the unit root test is the most commonly used method to identify the stationarity of a time series. Since the usual t and F tests can be applied in the presence of unit root (unit root does not follow normal distribution), the Dickey Fuller test is preferred. The Dickey Fuller test is based upon the assumption that the error term is uncorrelated, but in case the error term is correlated the augmented Dickey Fuller test is the more appropriate test.

The nonstationary time series can be transformed into stationary time series to avoid the problem of spurious regression. The transformation can happen through difference stationary process (DSP) or trend stationary process (TSP). In the difference stationary process, the first differences of the time series are taken to obtain a stationary time series. In the trend stationary process, the series is stationary around a trend line, which can be linear or nonlinear in nature. The TSP time series has to be regressed on time to obtain the stationary process (i.e., the de-trended series).

A researcher needs to be careful while identifying the nature of the time series, because the choice of transformation method would depend upon it. A misidentification of the time series (such as DSP instead of TSP or vice versa) will result in specification errors. A researcher also needs to be vigilant where a combination of two or more time series is involved. For example, a linear combination of two nonstationary processes may produce a stationary time series with unexpected results. This phenomenon is known as cointegration.

In time series, the two popularly used forecasting methodologies involve autoregressive integrated moving average (ARIMA) and vector auto regression (VAR). Unlike the regression models, ARIMA models allow to be explained by past or lagged values of Y itself and stochastic error terms. Sometimes ARIMA models are also known as atheoretic models as they are not derived from any economic theory. The VAR methodology resembles simultaneous equation modeling, but in this case each endogenous variable is explained by its lagged or past values and the lagged values of all other endogenous variables.

There are special types of models to cater to the specific requirements of certain data sets. For instance, the financial time-series data such as stock prices, exchange rates, and inflation rates often show wide swings in their values for an extended period of time followed by periods of relative calm. This phenomenon is termed as volatility clustering. A special class of models, namely autoregressive conditional heteroscedasticity (ARCH) and generalized ARCH (GARCH) are used for analysing and forecasting financial time series.

Econometrics offers researchers a wide variety of choices at each stage of model building. The challenge before a researcher is to choose the appropriate model. The results obtained are conditional upon the chosen model which implies that a researcher needs to be very careful while formulating the econometric model, especially when there may be several competing theories to explain an economic phenomenon. Initially a researcher may have to do a large number of hit and trial experiments to arrive at the appropriate model. The number of attempts is expected to decrease as one develops the necessary skill and intuition. Thus, econometric model building is sometimes considered an art rather than a science.

References

Bhattacharya, B., & Mukherjee, J. (2006). Indian stock price movements and the macroeconomic context: A time-series analysis. Journal of International Business and Economics, 5(1), 88-93.

Gujarati, D. N. (2004). Basic econometrics (4th ed.). New Delhi, India: Tata McGraw-Hill.


Reported by Jogendra Behera, with inputs from Jaydeep Mukherjee and D. P. Dash. [January 31, 2008]

Copyleft The article may be used freely, for a noncommercial purpose, as long as the original source is properly acknowledged.


Xavier Institute of Management, Xavier Square, Bhubaneswar 751013, India
Research World (ISSN 0974-2379) http://www1.ximb.ac.in/RW.nsf/pages/Home