Going from dbt models to pandas DataFrames

Going from dbt models to pandas DataFrames

One of our favorite features of dbt is the ability to build dependencies between data models using the ref function. With ref, you can build pipelines all in SQL, simply by referring to other tables in your select statements.

But what if you want to refer to these models in a Python script?

fal lets you use an analogous ref function in Python to refer to dbt models and download them as pandas DataFrames. Effectively, fal lets you mix in Python scripts into your dbt DAG and therefore enables you to do things that SQL alone cannot. You can now make forecasts on your models, or detect anomalies, or do sentiment analysis.

fal also handles the data warehouse connection and authentication out of the box, by tapping into the dbt authentication layer. No need to setup a dedicated Python client for BigQuery or Snowflake. You set up your connections once for dbt and that's it.

Setup fal

The latest version of fal is available on pip:

pip install fal

fal lets you connect Python scripts to dbt models in a schema.yml file by using a meta tag:

models:
  - name: my_model_name
    description: My model description
    config:
      materialized: table
    meta:
      fal:
        scripts:
          - my_python_script.py

The scripts tag in schema.yml expects a list of Python scripts with paths relative to the dbt project root directory.

Inside your Python script

The Python script connected to your dbt model has access to a set of magic functions and variables, such as write_to_source and ref. It is the ref function that lets us download an entire dbt model as a pandas DataFrame:

my_model_df = ref(context.current_model.name)

where context is a fal magic variable that allows you to access the dbt run context. For a full list of what is available in the context, check out our docs.

If you would like to download a model to a notebook for ad-hoc analysis, you can also do that!

Run fal scripts

The fal Python scripts in a dbt project are run by the fal run command from your terminal. But, not all of the scripts get run by default, only the ones associated with the models that were calculated in the most recent dbt run. Therefore, before running Python, run dbt:

dbt run

and then

fal run

and the fal Python scripts will run.

Conclusion

You can now download dbt models as pandas DataFrames. What's next? Maybe you want make a forecast or do a reverse ETL to Firestore? Check out the fal repository for more info, see our docs, see an example dbt project that uses fal and join our Discord server.