Fraud Doesn’t Wait: Accelerating AI-Driven Detection with Latency Budgets

In the high-stakes world of fraud detection, the difference between catching fraudulent activity and missing it can come down to milliseconds. While processing thousands of transactions per second, every moment spent waiting for feature computation is a moment a fraudster could exploit. For such mission-critical use cases, speed trumps perfection.

Today, we’re excited to announce Feature Retrieval Latency Budgets – a new capability that lets teams precisely control the tradeoff between feature completeness and response time, prioritizing speed when it matters the most.

The Problem: When Waiting Costs More Than Missing Data

Imagine you’re an engineer building a real-time fraud detection model. A suspicious transaction has just been initiated, and your AI model needs to score it immediately. Your service requests for 100 different features from your feature platform – most compute quickly, but one complex feature is running slowly because it is aggregating data for an entity that has tens of thousands of events. Do you:

A) Wait for all features, potentially missing your chance to stop fraud in real-time

B) Don’t wait for the response and fall back to fail-open (let transactions through, risking fraud) or fail-closed (block transactions, frustrating customers)

Until now, Tecton users faced only these two outcomes. But our customers, particularly those working on mission-critical systems, asked us for option C: instantly get 99 features to make a prediction and don’t wait for the 1 slow feature.

Why? Because modern AI models like neural networks or gradient-boosted trees are designed with high resilience to missing data. A slightly less precise fraud score delivered immediately is strategically preferable to a perfect score that arrives too late.

In addition to this strategic preference, two other problems occur with slow features:

Systemic availability risks: one slow feature (e.g., caused by hot-keys with too many events or by long-running API calls) can bring down an entire prediction pipeline
Resource waste: Computing high-latency features consumes excessive serving resources that may not justify their marginal value

Introducing Feature Retrieval Latency Budgets

Our solution introduces a simple yet powerful parameter: latencyBudgetMs. With this option, teams can now specify the maximum time they’re willing to wait for feature computation, prioritizing speed over completeness when necessary.

Here’s how it works in practice:

$ curl -X POST https://.tecton.ai/api/v1/feature-service/get-features\\
     -H "Authorization: Tecton-key $TECTON_API_KEY" -d\\
'{
  "params": {
    "workspaceName": "prod",
    "featureServiceName": "fraud_detection_feature_service",
    "joinKeyMap": {
      "user_id": "C1000262126"
    },
    "requestOptions": {
      "latencyBudgetMs": "250"
    },
    "metadataOptions": {
      "includeNames": true,
      "includeEffectiveTimes": true,
      "includeDataTypes": true,
      "includeSloInfo": true,
      "includeServingStatus": true
    }
  }
}'

When a Feature View computation exceeds this budget, Tecton doesn’t wait – it returns all successfully computed features with the status PRESENT and clearly marks the others with a new TIME_OUT status. This tells your application exactly which features were skipped due to latency constraints versus those missing due to no entity/value being found.

{
  "result": {
    "features": ["0", "1", 216409, null]
  },
  "metadata": {
    "features": [
      {
        "name": "transaction_amount_is_high.transaction_amount_is_high",
        "dataType": {
          "type": "int64"
        },
        "status": "PRESENT"
      },
      {
        "name": "transaction_amount_is_higher_than_average.transaction_amount_is_higher_than_average",
        "dataType": {
          "type": "int64"
        },
        "status": "PRESENT"
      },
      {
        "name": "last_transaction_amount_sql.amount",
        "effectiveTime": "2021-08-21T01:23:58.996Z",
        "dataType": {
          "type": "float64"
        },
        "status": "PRESENT"
      },
      {
        "name": "transaction_amount_last_1000d.sum",
        "dataType": {
          "type": "int64"
        },
        "status": "TIME_OUT"
      }
    ],
    "sloInfo": {
      "sloEligible": true,
      "sloServerTimeSeconds": 0.201323,
      "dynamodbResponseSizeBytes": 204,
      "serverTimeSeconds": 0.049082851
    }
  }
}

Implementing dynamic time-based logic in critical serving paths is notoriously complex – it introduces nondeterminism that can destabilize production systems if not handled with extreme care. Unlike simple timeout wrappers, our latency budgets require sophisticated orchestration to safely interrupt feature computation for slow features while maintaining correctness of features that can compute quickly and ensuring system reliability. This functionality is a key addition to Tecton’s broader real-time serving system, enabling the kind of high-frequency, low-latency decision-making that modern fraud detection and risk decisioning systems demand.

With this functionality, Tecton users get:

Fault isolation at the feature view level: No more “weakest link” syndrome, where one slow feature brings down your entire prediction pipeline.
Dynamic latency-vs-completeness tuning: Adjust latency constraints per request based on use case priority. Eg:
- Critical fraud checks: 200ms budget
- User Pre-approval flows: 300ms budget
- Credit scoring: 500ms budget
Graceful degradation under load: Systems now bend instead of breaking during traffic spikes or for hot entity keys, maintaining partial functionality rather than failing completely.
Infrastructure cost optimization: Prevent compute resource wastage on features that routinely exceed acceptable latency thresholds by setting budgets to prevent runaway computation costs.
Granular timeout visibility: The TIME_OUT status provides explicit signals on which features exceeded budgets, enabling targeted debugging or optimization of slow features. It is also an actionable signal your models can learn from. High-frequency timeouts for specific features can indicate suspicious entity behavior patterns, turning latency constraints into predictive signals that actually improve model performance.

We’re excited for our customer teams, such as Coinbase, Block, Plaid, Signifyd—and potentially yours—to unlock new, latency-sensitive AI applications with this powerful capability.

Fraud Doesn’t Wait: Accelerating AI-Driven Detection with Latency Budgets

The Problem: When Waiting Costs More Than Missing Data

Introducing Feature Retrieval Latency Budgets

You Might Like

Watch apply() 2025 to learn how ML leaders scale real-world machine learning.

Latest blogs

Proactive Drift and Data Quality Monitoring for Tecton Feature Views with Fiddler

Rethinking Feature Engineering for ATO Detection

Drift-Aware ML Systems

Follow Us

Book a Demo

Contact Sales

Request a free trial