Autoregressive integrated moving average
In statistics and econometrics, and in particular in time series analysis, an autoregressive integrated moving average model is a generalization of an autoregressive moving average model. Both of these models are fitted to time series data either to better understand the data or to predict future points in the series. ARIMA models are applied in some cases where data show evidence of non-stationarity, where an initial differencing step can be applied one or more times to eliminate the non-stationarity.
The part of ARIMA indicates that the evolving variable of interest is regressed on its own lagged values. The part indicates that the regression error is actually a linear combination of error terms whose values occurred contemporaneously and at various times in the past. The indicates that the data values have been replaced with the difference between their values and the previous values. The purpose of each of these features is to make the model fit the data as well as possible.
Non-seasonal ARIMA models are generally denoted ARIMA where parameters p, d, and q are non-negative integers, p is the order of the autoregressive model, d is the degree of differencing, and q is the order of the moving-average model. Seasonal ARIMA models are usually denoted ARIMAm, where m refers to the number of periods in each season, and the uppercase P,D,Q refer to the autoregressive, differencing, and moving average terms for the seasonal part of the ARIMA model.
When two out of the three terms are zeros, the model may be referred to based on the non-zero parameter, dropping "", "" or "" from the acronym describing the model. For example, is, is, and is.
ARIMA models can be estimated following the Box–Jenkins approach.
Definition
Given a time series data Xt where t is an integer index and the Xt are real numbers, an model is given byor equivalently by
where is the lag operator, the are the parameters of the autoregressive part of the model, the are the parameters of the moving average part and the are error terms. The error terms are generally assumed to be independent, identically distributed variables sampled from a normal distribution with zero mean.
Assume now that the polynomial has a unit root of multiplicity d. Then it can be rewritten as:
An ARIMA process expresses this polynomial factorisation property with p=p'−d, and is given by:
and thus can be thought as a particular case of an ARMA process having the autoregressive polynomial with d unit roots.
The above can be generalized as follows.
This defines an ARIMA process with drift.
Other special forms
The explicit identification of the factorisation of the autoregression polynomial into factors as above, can be extended to other cases, firstly to apply to the moving average polynomial and secondly to include other special factors. For example, having a factor in a model is one way of including a non-stationary seasonality of period s into the model; this factor has the effect of re-expressing the data as changes from s periods ago. Another example is the factor , which includes a seasonality of period 2. The effect of the first type of factor is to allow each season's value to drift separately over time, whereas with the second type values for adjacent seasons move together.Identification and specification of appropriate factors in an ARIMA model can be an important step in modelling as it can allow a reduction in the overall number of parameters to be estimated, while allowing the imposition on the model of types of behaviour that logic and experience suggest should be there.
Differencing
Differencing in statistics is a transformation applied to time-series data in order to make it stationary. A stationary time series' properties do not depend on the time at which the series is observed.In order to difference the data, the difference between consecutive observations is computed. Mathematically, this is shown as
Differencing removes the changes in the level of a time series, eliminating trend and seasonality and consequently stabilizing the mean of the time series.
Sometimes it may be necessary to difference the data a second time to obtain a stationary time series, which is referred to as second order differencing:
Another method of differencing data is seasonal differencing, which involves computing the difference between an observation and the corresponding observation in the previous season e.g a year. This is shown as:
The differenced data is then used for the estimation of an ARMA model.
Examples
Some well-known special cases arise naturally or are mathematically equivalent to other popular forecasting models. For example:- An ARIMA model is given by — which is simply a random walk.
- An ARIMA with a constant, given by — which is a random walk with drift.
- An ARIMA model is a white noise model.
- An ARIMA model is a Damped Holt's model.
- An ARIMA model without constant is a basic exponential smoothing model.
- An ARIMA model is given by — which is equivalent to Holt's linear method with additive errors, or double exponential smoothing.
Choosing the order
where L is the likelihood of the data, p is the order of the autoregressive part and q is the order of the moving average part. The k represents the intercept of the ARIMA model. For AIC, if k = 1 then there is an intercept in the ARIMA model and if k = 0 then there is no intercept in the ARIMA model.
The corrected AIC for ARIMA models can be written as
The Bayesian Information Criterion can be written as
The objective is to minimize the AIC, AICc or BIC values for a good model. The lower the value of one of these criteria for a range of models being investigated, the better the model will suit the data. The AIC and the BIC are used for two completely different purposes. While the AIC tries to approximate models towards the reality of the situation, the BIC attempts to find the perfect fit. The BIC approach is often criticized as there never is a perfect fit to real-life complex data; however, it is still a useful method for selection as it penalizes models more heavily for having more parameters than the AIC would.
AICc can only be used to compare ARIMA models with the same orders of differencing. For ARIMAs with different orders of differencing, RMSE can be used for model comparison.
Estimation of coefficients
Forecasts using ARIMA models
The ARIMA model can be viewed as a "cascade" of two models. The first is non-stationary:while the second is wide-sense stationary:
Now forecasts can be made for the process, using a generalization of the method of autoregressive forecasting.
Forecast intervals
The forecast intervals for ARIMA models are based on assumptions that the residuals are uncorrelated and normally distributed. If either of these assumptions does not hold, then the forecast intervals may be incorrect. For this reason, researchers plot the ACF and histogram of the residuals to check the assumptions before producing forecast intervals.95% forecast interval: , where is the variance of.
For, for all ARIMA models regardless of parameters and orders.
For ARIMA,
In general, forecast intervals from ARIMA models will increase as the forecast horizon increases.
Variations and extensions
A number of variations on the ARIMA model are commonly employed. If multiple time series are used then the can be thought of as vectors and a VARIMA model may be appropriate. Sometimes a seasonal effect is suspected in the model; in that case, it is generally considered better to use a SARIMA model than to increase the order of the AR or MA parts of the model. If the time-series is suspected to exhibit long-range dependence, then the d parameter may be allowed to have non-integer values in an autoregressive fractionally integrated moving average model, which is also called a Fractional ARIMA model.Software implementations
Various packages that apply methodology like Box–Jenkins parameter optimization are available to find the right parameters for the ARIMA model.- EViews: has extensive ARIMA and SARIMA capabilities.
- Julia: contains an ARIMA implementation in the TimeModels package
- Mathematica: includes function.
- MATLAB: the includes and
- NCSS: includes several procedures for
ARIMA
fitting and forecasting. - Python: the package includes models for time series analysis – univariate time series analysis: AR, ARIMA – vector autoregressive models, VAR and structural VAR – descriptive statistics and process models for time series analysis.
- R: the standard R stats package includes an arima function, which is documented in . Besides the part, the function also includes seasonal factors, an intercept term, and exogenous variables. The CRAN task view on is the reference with many more links. The package in R can automatically select an ARIMA model for a given time series with the function and can also simulate seasonal and non-seasonal ARIMA models with its function.
- Ruby: the gem is used for time series analysis, including ARIMA models and Kalman Filtering.
- : includes and .
- SAS: includes extensive ARIMA processing in its Econometric and Time Series Analysis system: SAS/ETS.
- IBM SPSS: includes ARIMA modeling in its Statistics and Modeler statistical packages. The default Expert Modeler feature evaluates a range of seasonal and non-seasonal autoregressive, integrated, and moving average settings and seven exponential smoothing models. The Expert Modeler can also transform the target time-series data into its square root or natural log. The user also has the option to restrict the Expert Modeler to ARIMA models, or to manually enter ARIMA nonseasonal and seasonal p, d, and q settings without Expert Modeler. Automatic outlier detection is available for seven types of outliers, and the detected outliers will be accommodated in the time-series model if this feature is selected.
- SAP: the APO-FCS package in SAP ERP from SAP allows creation and fitting of ARIMA models using the Box–Jenkins methodology.
- SQL Server Analysis Services: from Microsoft includes ARIMA as a Data Mining algorithm.
- Stata includes ARIMA modelling as of Stata 9.
- Teradata Vantage has the ARIMA function as part of its machine learning engine.
- TOL is designed to model ARIMA models .
- Scala: library contains ARIMA implementation for Scala, Java and Python. Implementation is designed to run on Apache Spark.
- PostgreSQL/MadLib: .
- X-12-ARIMA: from the US Bureau of the Census