Fintech technology
Startups Fintech May 5, 2026 • 8 min read

Feature Stores Explained: Why Your ML Models Stale Out

For: A Series B fintech CTO whose credit risk or fraud ML model performs well in training but drifts unexpectedly in production — and suspects the problem is somewhere in how features are computed, but has never heard of a feature store and doesn't know if they need one

Your credit risk model scored 0.87 AUC in backtesting. Three months into production, it's drifting toward 0.78 and your data scientists keep blaming “distribution shift.” Maybe. But before you retrain on fresh data for the third time this quarter, check something else: are the features your model sees at inference time computed by exactly the same code that produced the training set? In most fintech ML stacks the answer is no — and that gap, not the data, is why your model is decaying.

This is called training-serving skew, and it's the problem feature stores were built to solve.

The problem: two pipelines computing “the same” feature

A typical credit scoring model uses features like avg_transaction_amount_30d, num_failed_logins_7d, or days_since_last_credit_inquiry. During training, a data scientist writes a SQL query or pandas script against your warehouse to compute these for every historical user. The model trains, validates, ships.

Now a loan application comes in. Your API needs that same feature vector in under 200ms. Nobody is going to run a 40-second Snowflake query per request, so a backend engineer reimplements the feature logic in Python or Go against your Postgres replica or Redis cache. They get it 95% right.

That 5% delta is your skew. Examples we've seen in production:

None of these will throw an error. The model just quietly gets worse. Your team will spend weeks investigating “concept drift” when the actual problem is that the model never saw the production feature distribution because production is computing different numbers.

The analogy: a recipe vs. two cooks

Think of a feature as a dish. Your data scientist wrote the recipe (training pipeline). Your backend engineer wrote a different recipe from memory (serving pipeline). Both call the result “tomato soup,” but one is using fresh tomatoes and the other is using paste. The model was trained to recognize the first one.

A feature store is the single recipe book. Both cooks read from it. The dish tastes the same in the test kitchen and the dining room.

What a feature store actually is

Strip the marketing away and a feature store is three things glued together:

  1. A feature definition layer. You declare a feature once — its source, its transformation, its freshness requirements — usually in code (Python, SQL, or YAML). Tools: Feast, Tecton, Hopsworks, Databricks Feature Store, Vertex AI Feature Store.
  2. An offline store for training. Typically a warehouse or lake (BigQuery, Snowflake, S3 + Parquet). Holds the full history. Supports point-in-time correct joins so you don't leak future data into training labels.
  3. An online store for serving. A low-latency KV store (Redis, DynamoDB, Bigtable). Holds the latest feature value for each entity, refreshed by the same pipeline that populates the offline store.

The critical property is that both stores are written by the same transformation code. You don't have a training pipeline and a separate serving pipeline. You have one definition, materialized two ways.

A minimal worked example

Say you're building a fraud model and you need num_distinct_merchants_7d per user. In Feast, the definition looks roughly like this:

@feature_view(
    entities=[user],
    ttl=timedelta(days=7),
    source=transactions_source,
)
def user_merchant_diversity(df):
    return (
        df.groupby("user_id")
          .agg(num_distinct_merchants_7d=("merchant_id", "nunique"))
    )

At training time you call store.get_historical_features(entity_df, ["user_merchant_diversity:num_distinct_merchants_7d"]). It runs the transformation against your warehouse, point-in-time joined to your label timestamps. No leakage.

At serving time, an Airflow or streaming job runs the same transformation and pushes the latest values into Redis. Your API does store.get_online_features(...) and gets the value in under 10ms.

Same code. Same logic. Different storage targets. That's the whole trick.

Real-time feature store vs batch: what your use case needs

This is where most teams overbuild. The real-time feature store vs batch question comes down to how fresh a feature needs to be at the moment of prediction.

Most credit scoring models do fine with batch. Most card-fraud models need streaming for at least a handful of velocity features. Don't pay for streaming complexity if your decision window is “within 24 hours.”

Gotchas nobody mentions in the vendor demo

When you actually need a feature store

You probably need one if:

You probably don't need one if:

For fintech teams shipping credit or fraud models, the typical pattern we see at companies like Cashpo is: start without a feature store, get burned by skew once, then introduce Feast or a managed equivalent for the second model. That order is usually right. The first model teaches you what your features actually are. The second one is when consistency starts paying for itself.

The diagnostic question

Before you blame data drift the next time your model misbehaves, run one experiment: pull a sample of users scored in production yesterday. Recompute their features from scratch using your training pipeline. Compare value-by-value.

If more than 1-2% of feature values disagree, you don't have a model problem. You have a feature store problem — whether you call it that or not.

Frequently Asked Questions

Do I need a feature store if I only have one ML model in production?

Probably not yet. A shared Python module imported by both your training notebook and your serving API, plus integration tests that assert feature parity on a held-out sample, will catch most skew at a fraction of the operational overhead. Reach for a feature store when you have multiple models, multiple teams touching features, or streaming sources mixed with batch.

What's the difference between a feature store and a data warehouse?

A warehouse stores raw and modeled data optimized for analytics queries. A feature store sits on top, defines reusable feature transformations, and adds an online store for sub-100ms serving plus point-in-time correct historical joins for training. You'll typically run both — the feature store reads from the warehouse for batch features.

Can I use a feature store for credit scoring with only batch features?

Yes, and this is the most common fintech setup. Credit decisions usually tolerate features that are hours or even a day stale. You get the consistency benefits of the store without taking on the streaming infrastructure. Add real-time only when a specific feature genuinely needs sub-minute freshness — typically velocity checks for fraud, not credit risk.

Feast vs Tecton vs building in-house — how do I decide?

Feast is open source, lightweight, and good if you have platform engineers who can operate Redis and orchestration. Tecton and Hopsworks are managed, opinionated, and faster to adopt if you'd rather pay than operate. Building in-house only makes sense at significant scale where existing tools don't fit your data model. For most Series B fintechs, Feast or a managed service is the right call.

How long does it take to roll out a feature store for an existing ML system?

It depends heavily on how many features you have, how clean your warehouse is, and whether you need streaming. Migration is rarely a big-bang project — most teams onboard one model at a time. For a scoped assessment of your specific setup, talk to CodeNicely for a personalized review.

Found this useful? CodeNicely publishes engineering and product playbooks weekly. Browse the archive or tell us what you're building.