Chapter 21 of 25

STL Decomposition: Trend + Seasonal + Residual

Created May 28, 2026 Updated Jun 7, 2026

Before fitting any forecaster, the first thing worth doing is looking at the series in a useful way. STL gives a clean way to do that.

The premise is classical statistics: a time series can be approximately written as a sum of three structural components:

y(t) = T(t) + S(t) + R(t)

T = trend       — slow long-term movement
S = seasonality — periodic pattern (daily, weekly, yearly)
R = residual    — what's left over: noise, shocks, the unexplained

(There's a multiplicative version too: y = T · S · R — used when seasonal swings grow with the level of the series. In practice it's usually handled by taking log(y) first, which turns T · S · R into log T + log S + log R and lets you run the additive machinery unchanged. Additive is the common default and matches what most preprocessing pipelines assume.)

STL — Seasonal-Trend decomposition using LOESS (Cleveland et al., 1990) — is the standard algorithm for actually doing the split. It works by iterating two smoothers:

LOESS-smooth the series to estimate the trend.
Subtract the trend, then smooth the deseasoned series at the seasonal period to estimate seasonality.
Subtract trend and seasonality; whatever remains is the residual.
Repeat until the components stabilize.

The outputs are three series of the same length as the input, which sum back to the original. You can plot them stacked and immediately see things you couldn't see in the raw curve.

What the decomposition tells you before you model anything:

Is there a trend? A strong trend often suggests differencing for ARIMA-style models, or a trend feature for ML models — though many modern TS models handle trend without either. No trend → don't add one.
What's the seasonal period? Daily, weekly, yearly, multiple at once. STL handles a single period; for multiple seasonalities use MSTL.
Is the residual structured or noise? A residual with leftover autocorrelation, volatility clusters, or change points tells you the trend/season model isn't capturing everything — and points at what to add. A clean residual that looks like white noise tells you most of the signal is already explained.
Are there anomalies? Big spikes in the residual stand out exactly because the trend and seasonality have been subtracted off. STL is one of the most common pre-steps for anomaly detection.

Why this matters beyond EDA:

The same decomposition idea shows up inside modern forecasting architectures. N-BEATS (Oreshkin et al., 2020) builds the additive identity into the network: three output heads — trend, seasonality, residual — that sum to the forecast. Hybrid architectures with N-BEATS-style decomposition heads are common in production deep forecasting. The intuition that trend, seasonal, and residual are useful separable structures is older than deep learning and was carried forward into it.

The caveat for modeled decompositions: without explicit constraints (polynomial trend basis, Fourier seasonality basis, monotonicity), three free MLP heads aren't identifiable — the network is free to spread the same signal across them. STL, in contrast, has explicit smoothers for each component, so the trend really is trend and the seasonality really is seasonality. That's part of why it's still the go-to method for pre-modeling diagnostics decades after it was published.

statsmodels.tsa.seasonal.STL does it in one line in Python. The 90 seconds it takes to plot the three components saves hours of fitting models that were never going to capture the structure you didn't know was there.

Full breakdown of where decomposition appears across the time-series stack — classical statistics, ARIMA differencing, N-BEATS heads, additive forecast models: in the Time Series track.