Introducing fal

We are launching fal-serverless, a cloud based Python runtime that simplifies infrastructure for teams who are building data pipelines, analytics, ML training and inference.

Introducing fal

Greetings friends!

At Features and Labels, our mission is to give data teams the tools they deserve. With our open-source project fal, more than 250 companies have built Python based data projects to date. This all started back in the beginning of 2022 when we had the realization that dbt is the first tool that a data team brings into their stack. dbt provided a great developer experience, but the rest of the stack was lagging behind significantly. We built fal to make it easy to run Python with dbt projects to fix this problem. But we kept getting a particular question: “Where does Python run?”

Until today, the answer was “it runs locally”. That was a deliberate choice from our side. The main drawback of existing Python runtimes for dbt (Snowpark, Dataproc, etc.) is the difficulty of debugging and speed of iteration. Without the proper debugging tools for these runtimes, it was difficult to identify and solve issues with your code quickly. In an ideal world, you should be able to choose to run Python locally for faster development or in the cloud to take advantage of higher memory machines or GPUs. Today, we are introducing fal-serverless, a cloud based Python runtime that simplifies infrastructure for teams who are building data pipelines, analytics, ML training and inference.

Let’s go through an example that loads data into a dbt project and performs sentiment analysis on this data with dbt Python models using fal-serverless.

  1. Install fal-serverless with pip install fal-serverless
  2. Authenticate using the CLI: fal-serverless auth login
  3. Write your function in a new Python file called load_data_serverless.py:
from fal_serverless import isolated

@isolated(requirements=["pandas"])
def load_data():
    from fal_serverless import use_dbt_project
    import pandas as pd
    
    fal = use_dbt_project("/data/dbt-fal-serverless-demo")
 
    df = pd.read_csv(
        "s3://my_s3_bucket/data/raw_zendesk_data.csv",
        storage_options={
            "key": "AWS_ACCESS_KEY_ID",
            "secret": "AWS_SECRET_ACCESS_KEY",
            "token": "AWS_SESSION_TOKEN",
        }
    )

    fal.write_to_source("raw_zendesk_data", df)

load_data()

With the @isolated decorator you can run any Python function or fal script, in a serverless manner. This is accomplished by creating a dedicated environment in the cloud when an isolated function is called.

4. Execute python load_data_serverless.py.

That's it! The load_data function runs in our cloud environment. It first installs the necessary dependencies (pandas), then it executes the function, and finally it returns the function output, just like a regular Python function would. For the full range of things you can do (like choosing larger machines and GPUs, caching functions and more) go to the docs.

Once the raw data is loaded into our dbt project, we can create a dbt Python model that can run sentiment analysis on these tickets. This model runs on fal-serverless thanks to the dbt-fal adapter which is compatible with any database (Postgres, Bigquery, Snowflake, Redshift, SQLServer).

As you will see below, you can create an isolated Python environment and run a Python model in it with the fal_environment variable. You can also run this model on a GPU machine by simply adding dbt.config(fal_machine="GPU") in the body of your model.

def model(dbt, fal):
    dbt.config(materialized="table")
    dbt.config(fal_environment="sentiment-analysis")
    dbt.config(fal_machine="GPU")
    from transformers import pipeline, AutoModelForSequenceClassification, AutoTokenizer
    import numpy as np
    import pandas as pd
    import torch

    # Check if a GPU is available and set the device index
    device_index = 0 if torch.cuda.is_available() else -1

    # Load the model and tokenizer
    model_name = "distilbert-base-uncased-finetuned-sst-2-english"
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    # Create the sentiment-analysis pipeline with the specified device
    classifier = pipeline("sentiment-analysis", model=model_name, tokenizer=tokenizer, device=device_index)

    ticket_data = dbt.ref("zendesk_ticket_data")
    ticket_descriptions = ticket_data["DESCRIPTION"].tolist()

    # Run the sentiment analysis on the ticket descriptions
    description_sentiment_analysis = classifier(ticket_descriptions)
    rows = []

    for id, sentiment in zip(ticket_data.ID, description_sentiment_analysis):
        rows.append((int(id), sentiment["label"], sentiment["score"]))

    records = np.array(rows, dtype=[("id", int), ("label", "U8"), ("score", float)])

    sentiment_df = pd.DataFrame.from_records(records)

    return sentiment_df

For the full working example, check out the Github repo. If you want a step by step tutorial, you can also see this blog post.

With fal-serverless, we are solving a few key challenges:

🧑‍💻 Developer experience: You do not have to worry about the details of infrastructure. You can focus on your work.

🚀 Scalability: Your isolated functions can scale vertically (larger machines on demand) and horizontally (parallelization) to accommodate your workload, reducing the need to manage and maintain servers.

🔒 Security: fal-serverless ensures the secure execution of your isolated functions, protecting your sensitive data.

💰 Cost-effectiveness: Pay only for the compute time you consume, making it an efficient and budget-conscious choice.

We are also excited about fal-serverless for ML inference use cases. There is now a ton of open-source ML models that can do very impressive things. Running these on fal-serverless is super easy. Check out these examples:

To join our private beta, simply run fal-serverless auth login, and we will add you to the beta as soon as possible. We look forward to your valuable feedback and insights as we continue to refine and improve fal-serverless.

We are eager to see the projects you'll develop using fal-serverless. As always, our team is here to support you every step of the way. For more information, check out our documentation and our examples. Feel free to reach out to our team on Discord and in dbt Slack community. 🚀