Chapter 3 of 25

ARIMA and SARIMA, on one page

Created May 27, 2026 Updated May 27, 2026

ARIMA(p, d, q) is a linear forecasting model built from three parts: autoregression, differencing, and moving-average errors.

AR(p) — AutoRegression. Learn from past values. The next value is a linear combination of the previous p values plus noise. p is the lag order. AR(1) is "tomorrow's value is some fraction of today's value plus noise"; AR(7) lets the model use the last week.

I(d) — Integration. Difference the series until the remaining structure is closer to stationary. d=1 means model the first differences instead of raw values; d=2 means second differences. After fitting, predictions are integrated back to the original scale. This is where most data gets prepared — if a series has a stochastic trend, ARIMA usually needs differencing before the AR and MA parts can model the remaining stationary structure.

MA(q) — Moving Average. Learn from past forecast errors. The next value depends on the last q noise terms (the residuals from previous predictions). MA(q) models how recent forecast errors propagate forward — it is not a rolling average over past values, despite the name. It's a way to represent short-term shocks left in the residuals.

A pure AR(p) is ARIMA(p, 0, 0). A pure MA(q) is ARIMA(0, 0, q). The "I" in the middle is the part that turns a non-stationary series into one ARIMA can fit.

SARIMA(p, d, q)(P, D, Q, s) is roughly ARIMA with a seasonal copy of each piece. s is the season length (12 for monthly, 7 for daily-with-weekly-seasonality). (P, D, Q) are seasonal counterparts of (p, d, q) — seasonal AR, seasonal differencing, and seasonal MA, applied at lag s instead of lag 1. SARIMA(1, 1, 1)(1, 1, 1, 12) informally means "AR(1) + diff(1) + MA(1) plus a seasonal AR(1) + seasonal diff at lag 12 + seasonal MA(1)".

What they handle: stationary or made-stationary linear time series, especially with regular seasonality. What they don't: non-linear dynamics, regime changes, very long-range dependencies, abrupt structural breaks. They're also brittle to model order selection — getting (p, d, q) wrong is a real source of forecast failure.

Where these limits bite, deep-learning approaches step in: RNN / LSTM for short-to-medium dependencies, TCN for the long-range case, or the broader deep-learning architecture landscape for transformer-based variants.

SARIMAX adds eXogenous regressors — external signals like weather, holidays, promotions — as an additive component. This is often the first step from "classical" to "production-grade" forecasting when there are known drivers of the target.

Full breakdown — stationarity testing, model order selection (ACF/PACF, AIC), residual diagnostics, when to use SARIMAX versus a deep-learning approach: see Classical Statistical Forecasting: ARIMA, SARIMA, and SARIMAX.