How to run forecasts on dbt models

How to run forecasts on dbt models

Transforming and validating data is very easy with dbt. We can build complex models that span multiple dependencies. We can also create macros that can work across a dbt project and make data pipelines easy to maintain.

Sometimes we want to do more with our dbt models, like for example run forecasts on them. Such forecast would be useful for staying on top of critical metrics for a project. Forecasts can help us catch trends before they cause problems.

In this post we will walk through a simple way of making forecasts on dbt models. We will use Prophet to do forecasting in Python and we will use fal to run Python inside a dbt project.

What's in our model

This post uses a dbt model that has only two columns: y and ds, where ds is a timestamp and y is the numerical measure that we want to predict, such as a temperature measurement or monthly revenue. See here for an example model, where y refers to Agent Wait Time in a Zendesk ticket metric.

Build a forecasting script

Let's create a file forecast.py in a dbt project directory. Prophet will help us make a forecast on a pandas DataFrame and so the first thing we have to do is get a dbt model as a DataFrame. How would you do that? This is where fal comes in. fal provides a function dedicated for this task:

my_dataframe = ref("my_model_name")

That's it. There's no need to import ref. It's magic. 🪄

We're ready to do some simple ML. We start by building a Prophet object and fitting our model DataFrame to it:

from fbprophet import Prophet

m = Prophet()
m.fit(my_dataframe)

Prophet constructor accepts multiple arguments that can help you make better predictions. For us right now, the default constructor is good enough. The next step is to make a list of future dates that we want forecasts for:

n_future_days = 30
ds = my_dataframe["ds"].max()
future_dates = []
for _ in range(n_future_days):
  ds = ds + datetime.timedelta(days=1)
  future_dates.append(ds)

In this case, we are be looking 30 days into future with one timestamp per day. You may want to tweak this snippet according to your preferences.

Once we have a list of future dates, we create a new DataFrame with it:

import pandas as pd
df_future = pd.DataFrame({"ds": future_dates})

and make an actual forecast:

forecast = m.predict(df_future)

forecast variable here refers to a new DataFrame that holds predicted values. We finish by plotting our forecast data and storing the plot as an image:

from fbprophet.plot import plot_plotly

fig = plot_plotly(m, forecast, xlabel="Date", ylabel="Agent Wait Time")
fig.write_image("my_forecast_plot.png")

We're done with forecasting. For a full example of this script see here. Next, we connect our Python script to a dbt model.

Connect script to a dbt model

fal is a package that lets you run Python from inside your dbt project. You can install it with pip:

pip install fal

We can go to a dbt project directory and find a relevant schema.yml file. We then choose a model that we want to run forecast on. Finally, we add a fal meta tag and connect our script to the chosen model:

models:
  - name: zendesk_ticket_metrics
    description: Zendesk ticket metrics
    config:
      materialized: table
    meta:
      fal:
        scripts:
          - forecast.py

zendesk_ticket_metrics model in this example is now connected to forecast.py and can be set to run every time zendesk_ticket_metrics is re-calculated.

Run script

In the command line, run dbt:

dbt run

Followed by fal:

fal run

If everything is setup correctly, there should be a new image file in your dbt directory. Here's an example of what it could look like this:

Conclusion

You can now run forecasts on your dbt models. What's next? Fal lets you run any Python script in your dbt project. Maybe you want send a Slack message about your forecast or make it a part of your CI/CD pipeline. Check out the fal repository for more info, see an example dbt project that uses fal and join our Discord server.