Startups Fintech May 15, 2026 • 11 min read

How KarroFin Scored 250K Users Without a Credit Bureau

Q: Can you do credit scoring without credit history at all?

Yes, but not from zero. You need some source of repayment behavior to train against — either your own early loan book, a partner's portfolio, or a structured pilot. What you do not need is bureau data on each applicant. The KarroFin model scores applicants with no bureau file at all, using behavioral, device, and transactional signals that exist for everyone who completes an application.

For: A seed-to-Series-A fintech founder building a digital lending product for underbanked or thin-file borrowers in an emerging market, who has a working rule-based underwriting model but is watching approval rates stagnate and default rates creep up because the model has no signal on the majority of applicants who lack a formal credit history

When the KarroFin team came to us, the product worked. Loans were going out, repayments were coming in, and the rule-based underwriting model was doing its job — just not for enough people. Approval rates had stalled around the same cohort for months. The bureau-thin applicants, who were the entire reason the product existed, were getting rejected at a rate that made the unit economics impossible to scale.

This is a case study of how the model got rebuilt, what we tried that didn't work, and the one architectural decision that took the platform past 250,000 users. If you're a founder running a lending product in an emerging market and your approval funnel looks like a bottleneck shaped exactly like the credit bureau's coverage gap, this one is for you.

The starting position

KarroFin's first underwriting engine was a rules engine. Sensible, auditable, and the right call for v1. The logic looked roughly like this:

Pull bureau report if available
Check KYC and identity match
Apply hard cutoffs on income declared, employment type, and a handful of fraud flags
Tier the applicant into a risk bucket and price accordingly

The problem was visible in the funnel data. A large share of applicants had no bureau file at all, or had a file so thin (one credit card, six months old, no installment history) that the score the bureau returned was effectively noise. The rules engine had no good way to handle these cases. The safe default was rejection. The unsafe default was approval at the riskiest tier, which led to a default rate that ate the margin.

The team had tried the obvious things first. They tuned the rules. They added more declared-income checks. They built a fraud overlay. None of it moved the needle, because none of it solved the actual problem: the model had no signal on the people it most needed signal for.

The first instinct, and why it was wrong

The instinct, when you have a stagnating approval rate, is to train a machine learning model on your historical data and let it find patterns the rules engine missed. We did this. The first version was a gradient-boosted classifier trained on the loans KarroFin had already issued, with the target variable being whether the loan defaulted.

It performed beautifully on the validation set. AUC north of 0.8. We were about to put it in front of live traffic when one of the engineers ran a simple slice analysis and caught the problem.

The model's predictive power was almost entirely concentrated in the applicants who had bureau data. On the thin-file segment — the segment the entire product was built to serve — it performed barely better than random. Worse, when we looked at which features the model weighted most heavily, the top three were all bureau-derived. The model had quietly decided that the best predictor of repayment was having a credit history, which is exactly the conclusion the incumbent lenders had already drawn.

This is the trap, and it's worth stating clearly because we see fintech teams fall into it constantly: the bureau score is not the ground truth your model should chase. It's a lag indicator built on the same population your product is trying to expand beyond. The moment you use it as your primary training signal — directly or indirectly — you encode the incumbent's bias into your rejection logic. You become a slightly cheaper version of the bank that already wouldn't lend to these people.

The architectural call that unlocked it

The reframe was simple to say and hard to execute. Stop trying to predict default. Start trying to predict repayment behavior on a KarroFin loan, using signals that exist for every applicant, regardless of whether a bureau has heard of them.

That meant building three things:

1. A behavioral feature layer

The application flow already collected a lot of data the rules engine ignored. Device metadata, time-of-day of application, how long the user spent on each screen, whether they corrected typos, whether the declared income matched patterns in their SMS inbox (with permission), the age of their primary phone number, the stability of their device identifier across sessions. None of this is novel — it's the standard alternative-data playbook — but the team hadn't engineered it into features yet.

We built a feature store that normalized these signals and made them queryable at decision time. Roughly 200 features in the first cut. About 60 of them turned out to carry real signal after we ran feature importance and stability checks across cohorts.

2. A labeling strategy that didn't anchor on the bureau

This was the harder problem. If you don't train on default, what do you train on? We ended up with a multi-target setup:

Early repayment behavior on first-loan customers: did they pay the first three installments on time, late, or not at all? This is observable within weeks, not months, and it's a strong leading indicator.
Re-borrow intent and repeat-loan repayment: customers who came back for a second loan and paid it became a high-confidence positive label. This created a flywheel — every successful repeat loan sharpened the model.
Roll-rate transitions: rather than a binary default flag, we modeled the probability of an account moving from current to 30-days-past-due, then 30 to 60, and so on. This gave the model a much richer signal than a single end-state flag.

Crucially, bureau data was used as one input feature when available, and never as the training label. The model could learn from it where it existed, but its predictions weren't pegged to it.

3. A champion-challenger deployment pattern

We didn't rip out the rules engine. Bad idea in lending. Regulators, auditors, and your own ops team all need to understand why a decision was made, and a black-box model in production with no fallback is how you end up explaining yourself to a banking regulator at 9pm on a Friday.

Instead, the rules engine stayed as the safety floor — hard rejects for fraud, KYC failures, and policy violations remained rule-based and auditable. The ML model ran on top, scoring everyone who cleared the floor, and routing them into approval tiers. We ran the new model in shadow mode for several weeks against the old logic, compared decisions side by side, and only switched it to live decisioning after the ops team had reviewed the disagreements and signed off.

What moved

Approval rates on the thin-file segment moved up materially without a corresponding rise in default rates. The platform crossed 250,000 users. The repeat-loan rate — which is the metric that actually matters in this category, because the second loan is where you make your margin back — climbed steadily as the model got better at identifying customers who would come back.

A few specifics worth calling out because they're not obvious:

The biggest single feature improvement came from SMS-derived income consistency, not from any exotic signal. Boring beats clever.
Device-based features degraded fast. Fraud rings adapt. We had to retrain the device-signal portion of the model on a much shorter cadence than the rest of it.
The roll-rate target was the single best modeling decision we made. Treating delinquency as a continuous transition problem rather than a binary default problem gave the model a much smoother loss surface and let us catch problem accounts earlier.

What this approach is bad at

Honesty section. This architecture has real weaknesses, and if you're considering something similar you should know them.

It's data-hungry to bootstrap. If you're a brand-new lender with no portfolio yet, you cannot build this model. You need a real loan book — successes and failures — before the labels become meaningful. KarroFin had this. A pre-launch fintech does not, and pretending otherwise will burn you.

It's harder to explain to regulators. A rules engine is trivially auditable. A gradient-boosted model with 60 features is not. We had to build a parallel explanation layer that translated model decisions into human-readable reason codes, and that work is non-trivial. In jurisdictions with strict adverse-action notice requirements, this overhead is real.

The feedback loop is slow and biased. You only see repayment behavior on the loans you approve. The applicants you reject become a black hole — you never learn whether you were right. We mitigated this with periodic small-scale randomized approval experiments on borderline applicants, but that costs real money in expected losses and not every lender has the stomach for it.

Fraud adapts faster than credit risk. The behavioral and device signals that worked great in month one started decaying by month four. A meaningful chunk of ongoing engineering effort goes into the adversarial side of this, not the credit-modeling side. Plan for that.

The lessons that generalize

If you're building a digital lending platform architecture for thin-file borrowers, here is what we'd carry into the next project:

Define your label before you define your features. Most teams do this in the wrong order. They collect a bunch of alternative data, train a model against the easiest available label (bureau score, or default), and discover they've built something useless. Pick a label that reflects the behavior you actually want to predict on the population you actually want to serve.
Keep the rules engine. Layer the model on top. The reason most fintech AI underwriting model rewrites fail is that teams try to replace the rules instead of compose with them. Hard policy and fraud rules belong in code you can read. Probabilistic risk assessment belongs in a model. They are different problems.
Build the feature store before the model. If your features are computed inline in your scoring service, you cannot retrain efficiently and you cannot reuse signals across products. A proper feature store is unsexy infrastructure that pays for itself the first time you have to retrain in a hurry.
Treat the repeat-loan customer as your most valuable training signal. Anyone who comes back for a second loan and pays it is gold. Build your data pipeline around capturing this signal cleanly.
Run shadow mode longer than you think you need to. The cost of a bad model in production in lending is asymmetric. Your defaults compound for months. Your approvals don't.

How CodeNicely can help

If the situation we just described sounds uncomfortably familiar — working rule-based underwriting, stagnating approval rates, growing certainty that your bureau-anchored logic is the bottleneck — this is the kind of engagement we've done before. The KarroFin/Cashpo lending work is the most direct match, and the team that built it is still here.

What we'd bring to a conversation: a working playbook for alternative credit scoring AI, the feature-store and labeling architecture described above, and the unglamorous regulatory-explainability scaffolding that turns a model into something a compliance officer will actually sign off on. We've also done adjacent fintech work on GimBooks, which gave us a lot of exposure to how small-business transaction data behaves — useful if your thin-file population skews toward micro-merchants. Broader AI engineering capabilities are documented separately.

What we won't do: tell you that a model will fix a product-market-fit problem, or that you can skip building a real loan book before training. If you're pre-launch, we'll tell you that. If you're already in market and stuck, we can probably help.

Frequently Asked Questions

Can you do credit scoring without credit history at all?

Yes, but not from zero. You need some source of repayment behavior to train against — either your own early loan book, a partner's portfolio, or a structured pilot. What you do not need is bureau data on each applicant. The KarroFin model scores applicants with no bureau file at all, using behavioral, device, and transactional signals that exist for everyone who completes an application.

How is AI credit scoring for thin-file borrowers different from a traditional scorecard?

A traditional scorecard is built on a stable population with rich credit history and assumes the features that predict repayment are mostly financial. AI credit scoring for thin-file borrowers inverts both assumptions: the population is unstable and growing, and the predictive features are mostly behavioral and transactional rather than financial. The modeling techniques are not exotic — gradient boosting, careful feature engineering — but the data architecture and labeling strategy are very different.

Won't a black-box ML model cause regulatory problems?

Only if you deploy it as a black box. The pattern we use keeps hard policy rules in auditable code and uses the ML model for risk tiering on top. Every model decision is paired with reason codes generated from feature attribution, so adverse-action notices are explainable. Regulators are generally comfortable with this composition; they are not comfortable with an unexplained model making binary approve/reject calls.

How long before a new underwriting model starts performing better than rules?

This depends entirely on your data volume, label quality, and the operational maturity of your shadow-mode testing. We won't quote a timeline. If you want a realistic assessment for your specific situation, talk to CodeNicely and we'll scope it against your actual portfolio and data.

What's the minimum loan book size needed to train a real model?

There's no clean number, but the practical answer is: enough loans, observed for long enough, that you have a statistically meaningful count of both repayments and defaults across the segments you care about. If your defaulters are in the single digits, you don't have a model yet — you have a rules engine that needs more work. Be honest about which phase you're in.

Building something in Fintech?

CodeNicely partners with founders and tech teams to ship AI-native products that move metrics. Tell us about the problem you're solving.

Talk to our team