Array Type Features in Operational Machine Learning

Introducing Array Type Features

Jake Noble Last updated: October 17, 2023

We’re excited to announce that Tecton now natively supports Array type features. Our customers are now deploying array features in operational machine learning models. In this article, we’ll go through (1) how arrays are commonly used in operational machine learning systems and (2) an example of how a user can compute a similarity score between a product and a query using embeddings in real time with Tecton.

Array Features in Operational ML

Arrays are a feature data type that can be used across a number of applications. Consider a retailer that serves product recommendations to users based on their current search query and purchase history. Our retailer might build the following kinds of features:

Lists of categorical variables.
- product_categories: a list of categories a product belongs to, e.g. [shoes, women, outdoors] for a pair of women’s hiking boots.
- user_last_10_purchased_products: a list of the last ten product ids purchased by a user. Using our streaming capabilities, Tecton can keep this feature extremely fresh.
Dense embeddings.
- product_embedding: a precomputed embedding based off of each product’s description and metadata.
- search_text_embedding: a query-time embedding computed from the user’s search text, e.g. "5-piece knife set". This embedding can be provided to the Tecton API to be combined with precomputed features.

Because embeddings have become such an important part of operational ML systems, we dive deeper into how to use them in Tecton in the following section (see this article for more background on embeddings).

Embeddings

Embeddings are a way to transform text, images, or even arbitrary entities, such as a product id, into a lower-dimensional vector representation that captures most of the meaning in the original data.

By natively supporting arrays (including 32-bit float arrays), our customers can now easily bring powerful embedding features into production with a compact online storage format. This matters to our users because it can significantly reduce the infrastructure cost of online storage and serving.

A very common use for embeddings is found in language inputs, where outputs from pre-trained embedding models like Word2vec and GloVe can be used directly as features into models. Another use case we commonly see is employing embeddings to calculate a similarity score between two items and using that score as a feature.

Let’s go back to our example to show how you can compute a similarity score in real time using Tecton. Our customer, the retailer, wants to compare a user’s search to the descriptions of products in the catalogue. Computing a similarity score between every possible search query and every product description is impossible, as there are endless combinations. Instead, the similarity score must be computed between the query embedding and the precomputed product embedding on-the-fly. Tecton allows you to do this with sub-100ms latency. It’s also extremely easy to code:

@on_demand_feature_view(

    inputs={
        'product_embedding': Input(product_embedding),
        'search_text_embedding': Input(search_text_embedding)
    },
    output_schema=StructType([StructField('cosine_similarity', DoubleType())]),
    description="Computes the cosine similarity between a search text embedding and a precomputed product embedding."
)
def search_product_similarity(product_embedding: pandas.DataFrame, query_embedding: pandas.DataFrame):
    @np.vectorize
    def cosine_similarity(a: np.ndarray, b: np.ndarray):
        return np.dot(a, b)/(norm(a)*norm(b))

    df = pd.DataFrame()
    df["cosine_similarity"] = cosine_similarity(search_text_embedding["embedding"], product_embedding["embedding"])
    return df

The feature author only needs to declare the inputs and a simple pandas definition with the similarity score. Tecton then orchestrates the pipelines to compute and serve the feature on-demand. Tecton is uniquely built to simplify real time machine learning applications.

Conclusion

With the release of native support for array features, our customers are now able to deploy powerful features into production cheaper and faster. At Tecton, we continue to add capabilities that allow our customers to easily put complex features into production. If you are an organization building operational ML models and want to learn more, you can request a free trial here.

Add Your Heading Text Here

Introducing Array Type Features

Array Features in Operational ML

Embeddings

Conclusion

Follow Us

Book a Demo

Contact Sales

Request a free trial

Array Features in Operational ML

Embeddings

Conclusion

Related Posts

Using LangChain and Tecton to Enhance LLM Applications with Up-to-Date Context

Building a High Performance Embeddings Engine at Tecton

Productionizing Embeddings: Challenges and a Path Forward

Follow Us

Book a Demo

Contact Sales

Request a free trial