Introducing fal Python Models

fal enables you to run Python with your dbt project. Today we are enabling Python-only models anywhere in your dbt project: Python Data Models.

Introducing fal Python Models
Screenshot of a dbt lineage graph with a .py node in the middle
This post has been updated on 2022-10-14 to match the changes introduced in fal 0.7.0 following the dbt 1.3.0 release adding Python models support

fal enables you to run Python with your dbt project. Until today, fal users could associate their dbt models with Python scripts to run before or after a model, but these scripts did not generate new assets on their own. We had recently added the ability to run these Python scripts in the middle of a dbt DAG. Today we are taking a step towards enabling Python-only models anywhere in your dbt project: Python Data Models.

To start using Python Data Models, add a dbt variable fal-models-paths with a list of directories/folders in which to look for Python models. Think of it like model-paths of dbt, but for fal. This folder must not be in the model-paths for dbt because dbt now supports Python models natively for some adapters. (By the way: we are looking into integrating fal natively with dbt).

name: "jaffle_shop"
# ...
model-paths: ["models"]
# ...

vars:
  # Add this to your dbt_project.yml
  fal-models-paths: ["fal_models"]
New fal-models-paths in dbt_project.yml

Now add a Python (or Jupyter Notebook) file to your fal_models directory in your project. In this file, you can use the familiar ref and source functions to pull data from your existing dbt models. These functions also help fal automatically generate the correct DAG dependencies. Every Python file that creates a data model must now include a write_to_model call at the end to write back to the data warehouse.

from utils import make_forecast

# `ref` and `source` get picked up as dependencies automatically
orders = ref('orders_daily')

# prepare for forecast function
df_count = df_count.rename(columns={
  "order_date": "ds", 
  "order_count": "y"
})
df_forecast_count = make_forecast(df_count, periods=50)

write_to_model(df_forecast_count)

Then run  fal flow run:

❯ fal flow run --select orders_forecast+

File 'fal/order_detailed_cluster.sql' was generated from 'order_detailed_cluster.py'.
Please do not modify it directly. We recommend committing it to your repository.
File 'fal/orders_forecast.sql' was generated from 'orders_forecast.py'.
Please do not modify it directly. We recommend committing it to your repository.

20:40:14  Found 12 models, 20 tests, 0 snapshots, 0 analyses, 191 macros, 0 operations, 4 seed files, 0 sources, 0 exposures, 0 metrics

Executing command: dbt --log-format json run --project-dir ./jaffle_shop_with_fal --select orders_forecast
Running with dbt=1.1.0
Found 12 models, 20 tests, 0 snapshots, 0 analyses, 191 macros, 0 operations, 4 seed files, 0 sources, 0 exposures, 0 metrics
Concurrency: 10 threads (target='dev')
Finished running  in 2.08s.

16:40:22 | Starting fal run for following models and scripts:
(model: models/orders_forecast.py)
Concurrency: 10 threads
20:40:23  Unable to do partial parsing because config vars, config profile, or config target have changed

Executing command: dbt --log-format json run --project-dir ./jaffle_shop_with_fal --select orders_forecast_filter
Running with dbt=1.1.0
Unable to do partial parsing because config vars, config profile, or config target have changed
Found 12 models, 20 tests, 0 snapshots, 0 analyses, 191 macros, 0 operations, 4 seed files, 0 sources, 0 exposures, 0 metrics
Concurrency: 10 threads (target='dev')
1 of 1 START table model dbt_matteo.orders_forecast_filter ..................... [RUN]
1 of 1 OK created table model dbt_matteo.orders_forecast_filter ................ [CREATE TABLE (119.0 rows, 6.0 KB processed) in 2.90s]
Finished running 1 table model in 5.46s.
Completed successfully
Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
Running orders_forecast.py and downstream SQL model orders_forecast_filter
🔗
Check out the complete example in fal-ai/jaffle_shop_with_fal

Under the hood, fal generates ephemeral dbt models as SQL files for dbt to recognize the models and add them to the DAG. This is mainly for Python Data Models to appear in dbt docs and play well with other dbt commands.

dbt docs page showing Python model being recognized and partial lineage graph
dbt docs page of a Python Data Model. Notice the models/fal directory

Python Data Models is now our recommended way to build Python transformations that need to write data to the data warehouse. If you have been using after scripts, here is our guide to move to Python Data Models.

You can find out more about fal by reading the docs and our blog. fal is open source, so you can star us on GitHub. We also have a Discord server that everyone is welcome to join.