Announcing dbt-fal adapter
Today we are excited to announce the release of dbt-fal adapter. This is our first official release since we have been teasing it the last couple of months. dbt-fal allows you to develop your dbt Python models locally, seamlessly graduate them to the cloud and combine the scalability of your data warehouse with the rich Python ecosystem without building Docker containers and managing clusters.
dbt-fal adapter is the ✨easiest✨ way to run your dbt Python models.
Starting with dbt v1.3, you can now build your dbt models with Python. This leads to some cool use cases that were difficult to build with SQL alone.
Some examples are:
- Using Python stats libraries to calculate stats
- Building other predictive models such as classification and clustering
This is a glimpse into the future where analytics and data science stacks convergence, and the whole data team works centered around the same dbt project. Despite this exciting future, Python runtimes still leave a lot to be desired in terms of developer experience. Managing Spark clusters feel like a step backwards compared to the serverless modern datawarehouse experience.
Local iteration, seamless graduation to cloud
One of the main drawbacks of existing Python runtimes for dbt (Snowpark, Dataproc, etc.) is the difficulty of debugging and speed of iteration. Without the proper debugging tools for these runtimes, it can be difficult to identify and solve issues with your code quickly.
In an ideal world, you should be able to choose to run Python locally for faster development or in the cloud to take advantage of higher memory machines or GPUs. dbt-fal is the only adapter that provides this experience by putting developer experience as a top priority.
dbt-fal offers a very fast iteration speed because your scripts are running locally by default. This means that any
breakpoint in dbt-fal to stop mid-model-run by running it with the environment variable
We will cover how to graduate your workloads to the cloud in a future post.
Best in class Python environment management
Environment management is a breeze when working with dbt-fal. Using the same set of Python libraries for the entire project becomes problematic when the list of dependencies grow. If you ever worked on a large Airflow project you definitely felt this pain. In fact, not dealing with Python environment setup is one of the great benefits of using dbt as opposed to in house dbt-like frameworks built with Airflow. dbt Python models shouldn’t be a step backwards. One extra dependency should not bring the whole project down, a large team shouldn’t be blocked by waiting other parts of the project to update its dependencies.
dbt-fal solves this problem by running each dbt Python model in an isolated Python environment powered by our open sourced project isolate. For more details on how to use isolated environments, check out our docs.
Works alongside all data warehouses
dbt-fal is designed to be used alongside all data warehouse adapters that dbt supports such as dbt-postgres, dbt-redshift, dbt-bigquery, dbt-snowflake and more. Having a consistent Python runtime that is compatible with all data warehouses reduces confusion around platform specific features. With dbt-fal all dbt adapters now have a reliable Python runtime.
To get started
2. Update your
profiles.yml and add the fal adapter:
db_profileattribute. This is how the fal adapter knows how to connect to your data warehouse.
That is it! It is really that simple 😊 . When you run
dbt run all the SQL dbt models are executed by the main adapter you specified in your
profiles.yml and all the Python models are executed by the fal adapter.
Head to our documentation website or GitHub repo for more examples and detailed explanation of all of our features.
Next up, we will talk about how you can run your Python models in the cloud! Stay tuned.
dbt-fal is a new adapter that makes dbt Python experience 10x better:
- it can run your Python code locally and in the cloud
- it provides easy environment management and isolation between models
- it lets you run code in the same Python environment in the cloud and locally
- it supports all warehouses and databases, including Postgres