Introducing Array Type Features - Tecton

Tecton

Home / Blog / Introducing Array Type Features /

Introducing Array Type Features

We’re excited to announce that Tecton now natively supports Array type features. Our customers are now deploying array features in operational ML models. In this article, we’ll go through (1) how arrays are commonly used in operational ML systems and (2) an example of how a user can compute a similarity score between a product and a query using embeddings in real time with Tecton.

Array Features in Operational ML

Arrays are a feature data type that can be used across a number of applications. Consider a retailer that serves product recommendations to users based on their current search query and purchase history. Our retailer might build the following kinds of features:

  1. Lists of categorical variables.
    • product_categories: a list of categories a product belongs to, e.g. [shoes, women, outdoors] for a pair of women’s hiking boots.
    • user_last_10_purchased_products: a list of the last ten product ids purchased by a user. Using our streaming capabilities, Tecton can keep this feature extremely fresh.
  2. Dense embeddings.
    • product_embedding: a precomputed embedding based off of each product’s description and metadata.
    • search_text_embedding: a query-time embedding computed from the user’s search text, e.g. "5-piece knife set". This embedding can be provided to the Tecton API to be combined with precomputed features.

Because embeddings have become such an important part of operational ML systems, we dive deeper into how to use them in Tecton in the following section (see this article for more background on embeddings).

Embeddings

Embeddings are a way to transform text, images, or even arbitrary entities, such as a product id, into a lower-dimensional vector representation that captures most of the meaning in the original data.

By natively supporting arrays (including 32-bit float arrays), our customers can now easily bring powerful embedding features into production with a compact online storage format. This matters to our users because it can significantly reduce the infrastructure cost of online storage and serving.

A very common use for embeddings is found in language inputs, where outputs from pre-trained embedding models like Word2vec and GloVe can be used directly as features into models. Another use case we commonly see is employing embeddings to calculate a similarity score between two items and using that score as a feature.

Let’s go back to our example to show how you can compute a similarity score in real time using Tecton. Our customer, the retailer, wants to compare a user’s search to the descriptions of products in the catalogue. Computing a similarity score between every possible search query and every product description is impossible, as there are endless combinations. Instead, the similarity score must be computed between the query embedding and the precomputed product embedding on-the-fly. Tecton allows you to do this with sub-100ms latency. It’s also extremely easy to code:

@on_demand_feature_view(

    inputs={
        'product_embedding': Input(product_embedding),
        'search_text_embedding': Input(search_text_embedding)
    },
    output_schema=StructType([StructField('cosine_similarity', DoubleType())]),
    description="Computes the cosine similarity between a search text embedding and a precomputed product embedding."
)
def search_product_similarity(product_embedding: pandas.DataFrame, query_embedding: pandas.DataFrame):
    @np.vectorize
    def cosine_similarity(a: np.ndarray, b: np.ndarray):
        return np.dot(a, b)/(norm(a)*norm(b))

    df = pd.DataFrame()
    df["cosine_similarity"] = cosine_similarity(search_text_embedding["embedding"], product_embedding["embedding"])
    return df

The feature author only needs to declare the inputs and a simple pandas definition with the similarity score. Tecton then orchestrates the pipelines to compute and serve the feature on-demand. Tecton is uniquely built to simplify real time machine learning applications.

Conclusion

With the release of native support for array features, our customers are now able to deploy powerful features into production cheaper and faster. At Tecton, we continue to add capabilities that allow our customers to easily put complex features into production. If you are an organization building operational ML models and want to learn more, you can request a free trial here.

Share:

Share

Get your models to production

Sign up for the latest from Tecton

Get all the newest content from Tecton directly to your inbox



Contact us

info@tecton.ai 548 Market St San Francisco, CA 94104

Tecton is growing.

Help us build the future of ML.

© Tecton, Inc. All rights reserved. Various trademarks held by their respective owners.

Privacy and Terms

Request a free trial

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​

Request a free trial

Interested in trying Tecton? Leave us your information below and we’ll be in touch.​