Tecton

Optimizing Feature Materialization Costs in a CI/CD Environment

  • Nick Acosta
By
Published: September 18, 2023
Last updated: January 26, 2024

Productionizing features as code with CI/CD can help organizations address new machine learning use cases or improve existing ones. Adding new features to production without the proper testing can break entire systems, but maintaining multiple feature store copies to develop and test can become very costly very fast as data sources and features scale.

At Tecton, we’ve been helping developers navigate this by establishing some best practices and even product features to ensure that you are bringing new features online in the most performant and cost effective ways possible.

Experimenting with new features

There are several common implementation strategies that may be a good fit for feature development. In this article we are going to focus on one such strategy, a GitHub Flow. In a GitHub Flow, a main branch always contains production code, and other branches can be created to add or fix features.

Diagram showing GitHub Flow branching strategy with Tecton.

Experimentation on new features and data sources for ML can happen in a notebook. After successfully identifying a promising new feature in a notebook, it can be added to a fork of the production code, a dev branch (1 in the picture above). Features can be promoted further, to a staging environment (2), where they are incorporated into full ML pipelines before joining the features in production (3).

Continuous feature development with Tecton

In Tecton, new branches are associated with new workspaces. Running tecton apply on a branch in a new workspace creates a clone of a production environment.  Tecton development workspaces are a great way to save on infrastructure costs (as they incur no costs) because they do not materialize any data and do not consume compute or serving resources, but they do ensure Tecton can effectively reach the appropriate data sources needed and the joins, aggregations, and transformations that are defined are executing as expected.

git branch checkout -b user/add-order-feature
tecton workspace create user/add-order-featureview-dev
tecton plan
tecton apply 

New data sources, feature views, and feature services can be interactively developed in a notebook. Once tested and verified, they can be committed to this new branch with git add and applied to the development workspace, can be promoted to a staging branch and workspace, and finally merged back into the main branch.

git add -A
git commit -m 'order feature view'
git push --set-upstream origin user/add-order-featureview-dev

Regardless of how you experiment with feature development, further testing is still going to be needed to ensure features are integrated properly with the rest of the ML pipeline components before promoting features to production.

Promoting features to production

A testing/staging environment is where features are typically materialized in offline and online stores for the first time. Promoting feature views and feature services to a staging environment before production ensures that machine learning models can retrieve features from an offline store with a high enough throughput for model training and from an online store with a low enough latency for real-time predictions. A staging environment with Tecton provides a quick and cost effective way to make sure a full feature pipeline is working as intended via a live workspace.

git branch checkout -b user/add-order-feature
# a --live workspace will materialize features to offline and online stores
tecton workspace create user/add-order-featureview-test --live 
tecton plan
tecton apply 
git add -A
git commit -m 'order feature view'
git push --set-upstream origin user/add-order-featureview-test

A backfill refers to any materialization operations performed on data in the past. If features are built from data sources with large amounts of historical data, performing a full backfill in a staging environment can lead to large and unnecessary infrastructure costs. Tecton has identified 4 ways developers limit data materialized to staging environments, with Tecton’s post_processor function, reading from a sampled data source, filtering in SQL, or via Tecton’s feature_start_time. A Tecton feature view or service can have online = False to prevent unnecessary materialization to a staging environment until one of these methods has been adopted.

Once a feature is materializing properly in a staging environment, extra options can be included to facilitate a great production environment for Tecton features. Tecton automatically applies tags to resources that can be used to track their associated cloud infrastructure costs. A prevent_destroy tag can be applied to Tecton resources in production that must be unset before they can be deleted in order to prevent the need to re-materialize accidentally deleted production feature services. For a full list of the considerations and improvements that can be made when running Tecton in production, see our docs

Trying it yourself

We have prepared a notebook to work with Tecton across dev/test/prod environments with the help of Tecton’s get_workspace and get_current_workspace functions.

from tecton import get_current_workspace

PROD_WORKSPACE = "prod"
DEV_START_TIME = datetime(2023,1,1)


def get_feature_start_time_for_env(feature_start_time):
    return feature_start_time if get_current_workspace() == PROD_WORKSPACE else DEV_START_TIME

@batch_feature_view(
    sources=[orders],
    entities=[customer],
    feature_start_time=get_feature_start_time_for_env(datetime(2018, 1, 1)),
		...

If you have Databricks and Tecton accounts (sign up for a free Tecton trial here), you will be able to walk through the creation and utilization of multiple Tecton workspaces to create cost effective continuous development pipelines. For ease of use, this example has been compiled into a single notebook without integrating with any Git tools, but we would recommend separating these environments by connecting Tecton to a CI/CD tool.

Request a Demo

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button

Contact Sales

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button

Request a free trial

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Unfortunately, Tecton does not currently support these clouds. We’ll make sure to let you know when this changes!

However, we are currently looking to interview members of the machine learning community to learn more about current trends.

If you’d like to participate, please book a 30-min slot with us here and we’ll send you a $50 amazon gift card in appreciation for your time after the interview.

CTA link

or

CTA button