

TensorFlow Probability uses structural time series models to conduct time series forecasting. In particular, this library allows for a “scenario analysis” form of modelling — whereby various forecasts regarding the future are made.

TensorFlow概率使用结构时间序列模型进行时间序列预测。 尤其是,该库允许进行“情景分析”形式的建模,从而做出有关未来的各种预测。

Structural time series modelling takes the inherent characteristics of the time series into account when making forecasts. This includes factors such as the local linear trend, seasonal, residual and autoregressive components. The greater the variation surrounding these components — the more uncertain the forecast.

结构时间序列建模在进行预测时会考虑时间序列的固有特征。 这包括局部线性趋势季节残差自回归成分等因素。 这些组件之间的差异越大,预测就越不确定。

The examples illustrated in this article use the template from the Structural Time Series modeling in TensorFlow Probability tutorial, of which the original authors (Copyright 2019 The TensorFlow Authors) have made available under the Apache 2.0 license.

本文中说明的示例使用TensorFlow概率教程中的结构时间序列建模中的模板,该模板的原始作者(Copyright 2019 The TensorFlow Authors)已获得Apache 2.0许可。

联合航空旅客数据 (United Airlines Passenger Data)

For this example, a structural time series model is built in TensorFlow Probability to forecast air passenger data. The data is sourced from San Francisco Open Data: Air Traffic Passenger Statistics.

对于此示例,在TensorFlow概率中构建了一个结构时间序列模型来预测航空乘客数据。 该数据来自“旧金山开放数据:空中交通旅客统计” 。

In particular, passenger numbers for United Airlines from February 2014 — June 2020 are analysed. The specific segment of passengers analysed are enplaned, domestic, departing from Terminal 3 at Boarding Area E.

特别是分析了2014年2月至2020年6月联合航空的乘客数量。 从3号航站楼E登机区出发的经过分析的特定旅客是国内旅客。

Here is a visual overview of the time series:


We can see that passenger numbers have traditionally ranged between 200,000 to 350,000 — before plummeting to a low of 7,115 in May 2020.


It is wishful thinking to expect that any time series model would have been able to forecast this — such a drop was very sudden and completely out of line with the overall trend.


However, could TensorFlow Probability have potentially identified a drop of a similar scale? Let’s find out.

但是,TensorFlow概率是否有可能识别出类似规模的下降? 让我们找出答案。

TensorFlow概率模型 (TensorFlow Probability Model)

The model is fitted with a local linear trend, along with a monthly seasonal effect.


def build_model(observed_time_series):
trend = sts.LocalLinearTrend(observed_time_series=observed_time_series)
seasonal = tfp.sts.Seasonal(
num_seasons=12, observed_time_series=observed_time_series)
residual_level = tfp.sts.Autoregressive(
observed_time_series=observed_time_series, name='residual')
autoregressive = sts.Autoregressive(
model = sts.Sum([trend, seasonal, residual_level, autoregressive], observed_time_series=observed_time_series)
return model

Note that since autocorrelation is detected as being present in the series — an autoregressive component is also added to the model.


Here is a plot of the autocorrelation function for the series:


Source: Jupyter Notebook Output
资料来源:Jupyter Notebook输出

The time series is split into training and test data for the purposes of comparing the forecasts with the actual values.


The forecast is made using the assumption of a posterior distribution — that is, a distribution comprised of the prior distribution (prior data) and a likelihood function.

预测是使用后验分布 (即由先验分布(先验数据)和似然函数组成的分布)的假设进行的。

Source: Image Created by Author

In order to effect this forecast, the TensorFlow Probability model minimises the loss in the variational posterior as follows:


#@title Minimize the variational loss.# Allow external control of optimization to reduce test runtimes.
num_variational_steps = 200 # @param { isTemplate: true}
num_variational_steps = int(num_variational_steps)optimizer = tf.optimizers.Adam(learning_rate=.1)
# Using fit_surrogate_posterior to build and optimize the variational loss function.@tf.function(experimental_compile=True)
def train():
elbo_loss_curve =
return elbo_loss_curveelbo_loss_curve = train()plt.plot(elbo_loss_curve)
plt.title("Loss curve") Draw samples from the variational posterior.
q_samples_tseries_ = variational_posteriors.sample(50)

Here is a visual of the loss curve:


Source: TensorFlow Probability

预报 (Forecasts)

20 samples (or 20 separate forecasts) are made using the model:


# Number of scenarios
num_samples=20tseries_forecast_mean, tseries_forecast_scale, tseries_forecast_samples = (
tseries_forecast_dist.mean().numpy()[..., 0],
tseries_forecast_dist.stddev().numpy()[..., 0],
tseries_forecast_dist.sample(num_samples).numpy()[..., 0])

Here is a plot of the forecasts:


Source: TensorFlow Probability

We can see that while the worst case scenario forecasted a drop to 150,000 passengers — the model generally could not forecast the sharp drop we have seen in passenger numbers.


Here is an overview of the time series components:


Source: TensorFlow Probability

In particular, we can see that towards the end of the series — we see a widening of variation in the autoregressive and seasonal components — indicating that the forecasts have become more uncertain as a result of this higher variation.


However, what if we were to shorten the time series? Let’s rebuild the model using data from January 2017 onwards and see how this affects the forecast.

但是,如果我们要缩短时间序列怎么办? 让我们使用2017年1月以后的数据重建模型,看看这如何影响预测。

Source: TensorFlow Probability

We can see that the “worst-case scenario” forecast comes in at roughly 70,000 or so. While this is still significantly above the actual drop in passenger numbers — this model is doing a better job at indicating that a sharp drop in passenger numbers potentially lies ahead.

我们可以看到“最坏情况”的预测大约为70,000。 尽管这仍大大高于实际的乘客人数下降,但该模型在表明潜在的乘客人数急剧下降方面做得更好。

Let’s analyse the time series components for this forecast:


Source: TensorFlow Probability

Unlike in the last forecast, we can see that the autoregressive, residual and seasonal components are actually narrowing in this instance — indicating more certainty behind the forecasts. In this regard, incorporating more recent data into this forecast has allowed the model to determine that a significant drop in passenger numbers could lie ahead — which ultimately came to pass.

与上次预测不同,我们可以看到在这种情况下自回归,残差和季节性成分实际上正在缩小,这表明预测的确定性更高。 在这方面,将更多最新数据纳入此预测已使该模型能够确定未来可能会出现旅客数量的大幅下降,而这种下降最终将成为现实。

Note that a main forecast (as indicated by the dashed orange line) is also given. Under normal circumstances, the model indicates that while there would have been a dip in passenger numbers to 200,000 — numbers would have rebounded to 250,000 in June. This is still less than the nearly 300,000 passengers recorded for the month of June — indicating that downward pressure on passenger numbers was an issue before COVID-19 — though nowhere near to that which has actually transpired, of course.

注意,还给出了主要预测(如橙色虚线所示)。 在正常情况下,该模型表明,尽管旅客人数将下降至20万人,但6月份的人数将回升至25万人。 这仍然低于6月份记录的近30万名乘客-这表明在COVID-19之前,乘客人数的下降压力是一个问题-当然,距离实际发生的事情还差得很远。

结论 (Conclusion)

This has been an overview of how TensorFlow Probability can be used to conduct forecasts — in this case using air passenger data.


Hope you found this article of use, and any feedback or comments are greatly welcomed. The code and datasets for this example can be found at my GitHub repository here.

希望您能找到本文的使用,并欢迎任何反馈或意见。 该示例的代码和数据集可以在我的GitHub存储库中找到 。

Disclaimer: This article is written on an “as is” basis and without warranty. It was written with the intention of providing an overview of data science concepts, and should not be interpreted as professional advice in any way.

免责声明:本文按“原样”撰写,不作任何担保。 它旨在提供数据科学概念的概述,并且不应以任何方式解释为专业建议。

翻译自: https://towardsdatascience/forecasting-air-passenger-numbers-with-tensorflow-probability-1b53e5e5fea2

本文标签: 概率乘客人数航空tensorflow