How to reverse ETL a dbt model into Google Firestore

How to reverse ETL a dbt model into Google Firestore

Google Firestore is a cloud-hosted NoSQL database that can be directly accessed by mobile and web applications through native SDKs. It has a rich set of use cases, including keeping track of inventories, user sessions and state mutations. Firestore is fast and enable low-latency querying of large quantities of data. While dbt is very good at transforming data in a data warehouse, sometimes we need this data to be available in a low-latency manner. One way to achieve this is by loading warehouse data to Firestore in a reverse ETL.

In this post we look at how to send data from a dbt model straight to Firestore. Fal lets you run Python code on top of your dbt models, so we will use fal to send model data to Firestore in just two lines of Python.

If you want to learn more about fal, see here.

Python script

Here's a script that gets data from a dbt model and sends it straight to Firestore:

df = ref("zendesk_ticket_data")
write_to_firestore(df=df, collection="zendesk_ticket_data", key_column="id")

Functions ref and write_to_firestore are provided by fal, there's no need to import them. ref downloads a dbt model into memory as a pandas DataFrame. In this case, we are downloading a table for a model named zendesk_ticket_data.

write_to_firestore takes the downloaded DataFrame, the name of a target Firestore collection, and the name of a key column in the DataFrame. A key column has to hold a unique identifier of every entry. For example, in case of a model containing a list of unique users, the key column would refer to user ids.

Save this script somewhere as to_firestore.py .

Connect script to a dbt model

Install fal if you haven't already:

pip install fal

We can go to a dbt project directory and find a relevant schema.yml file. We then choose a model that we want to send to Firestore. Finally, we add a fal meta tag and connect our Python script to the chosen model:

models:
  - name: zendesk_ticket_data
    description: Zendesk ticket data
    config:
      materialized: table
    meta:
      fal:
        scripts:
          - path/to/to_firestore.py

zendesk_ticket_data model in this example is now connected to to_firestore.py and can be set to run every time zendesk_ticket_data is re-calculated.

Run script

In the command line, run dbt:

dbt run

Followed by fal:

fal run

If everything is setup correctly, Firestore should be updated with a new collection:

Conclusion

You can now send dbt model data to Firestore. What's next? Fal lets you run any Python script in your dbt project. Maybe you want send a Slack message or make writing to Firestore a part of your CI/CD pipeline. Check out the fal repository for more info, see an example dbt project that uses fal and join our Discord server.