BASIC: Behavioural Analytics for Score Improvement in Credit

A Multi-Modal Machine Learning Framework for Credit Score Prediction, Bureau Sensitivity Estimation, and Personalised Improvement Pathways in India

Tara Labs AI Research Team Shubham Arawkar · Pranav Murali · Anupam Acharya

Tara Labs AI Research Team, Bengaluru, India

With advisory contributions from Abhijit Verma† †Guest Research Advisor

research@taralabs.ai

Working Paper — 2025 © Tara Labs AI. All rights reserved.


Abstract

Bureau credit scores in India — issued by CIBIL, Experian, Equifax, and CRIF — function as the primary gatekeeping mechanism for consumer access to formal lending. Yet these scores are retrospective, opaque, and optimised entirely for lender-side default risk: they render a verdict on the consumer's past without offering any forward guidance on their financial future. We introduce BASIC (Behavioural Analytics for Score Improvement in Credit), a multi-modal machine learning framework built on a fundamentally different premise — that the right question to ask about a consumer's credit profile is not "how likely are they to default?" but "how do we help them improve?"

BASIC integrates bureau tradeline history, banking cashflow signals, and longitudinal behavioural telemetry to accomplish four tasks: predicting month-ahead credit scores (RMSE 21.9); recovering latent bureau scoring sensitivities through the Bureau Sensitivity Module — an empirical estimation framework that requires no access to proprietary bureau systems; estimating individualised causal effects of credit actions using Causal Forests and Double Machine Learning; and generating personalised, time-indexed improvement pathways. The model is trained and evaluated on a user-consented dataset of 550,000 individuals spanning 10M+ sample-months, collected between January 2021 and December 2024.

Results are encouraging. BASIC achieves 94% directional accuracy — correctly identifying whether a score will rise or fall — and a causal correlation of 0.71 between predicted and observed score movements following real-world actions. Among users who followed BASIC-generated recommendations, median score improvement over 90 days was +52 points, with the 90th percentile reaching +120 points. Compared to a matched control group, recommendation-followers improved their scores 2.4× faster.

To our knowledge, this is the first scientific, population-scale framework for credit score improvement modelling in India. We propose the broader research area this work opens as Behavioural Credit Analytics — a discipline concerned with how financial behaviour drives score movement and how AI systems can translate that understanding into consumer empowerment.

Keywords: credit scoring, causal inference, gradient boosting, LSTM, causal forests, financial inclusion, India, bureau sensitivity estimation, counterfactual estimation, DPDP compliance


1. Introduction

Walk into most Indian banks today and ask a loan officer what you can do to improve your credit score. Odds are, you'll get some version of the same answer: pay on time, keep your credit utilisation low, don't apply for too many loans at once. Useful advice, approximately. But almost completely unquantified. Nobody tells you how much your utilisation needs to drop, or by when, or what the actual penalty was for that missed EMI two years ago.

This gap — between the enormous influence credit scores exert and the near-total absence of scientific guidance on improving them — is the central motivation for this work.

Credit scores issued by India's four major bureaus (CIBIL, Experian, Equifax, CRIF) are, at their core, probability-of-default (PD) models. They estimate the likelihood that a borrower will fail to repay, and they are calibrated to serve lenders, not consumers. The retail credit market they govern is substantial — over ₹22 trillion in disbursements in FY2024 (RBI, 2024) — and growing rapidly as digital lending, BNPL products, and co-branded credit cards pull millions of first-time borrowers into the formal financial system.

The trouble is that the infrastructure guiding these new borrowers has not kept pace. A scoring model engineered for steady-state credit populations in the early 2000s is a blunt instrument for a population that is overwhelmingly composed of first-generation credit users: gig workers with irregular income, self-employed individuals, salaried employees in Tier 2 and 3 cities whose financial behaviour doesn't map cleanly onto models trained on metro, formal-sector data. Over 300 million Indians sit in a near-prime or credit-invisible segment (TransUnion CIBIL, 2023) — not chronic defaulters, but people the system was never designed to help.

We propose BASIC (Behavioural Analytics for Score Improvement in Credit) as a response to this problem. BASIC is not a risk model. It is a credit improvement framework, designed to answer the questions that actual consumers ask:

  • What will my credit score be next month?
  • What is the bureau penalising me for, and how much does each factor cost me?
  • If I pay off this card, or take this loan, or wait out this inquiry — what actually happens to my score?
  • What is the fastest realistic path to a score that unlocks better credit products?

These questions require a different kind of modelling than default prediction. They require forecasting, causal inference, and the ability to simulate counterfactual futures — none of which are addressed by existing credit scoring literature for Indian consumers.

1.1 Contributions

This paper makes the following contributions:

C1 — A novel problem formulation. We formally define the Credit Score Improvement Problem (CSIP), distinguishing it from credit risk prediction as a distinct inference task with different data requirements, model objectives, and evaluation criteria.

C2 — A multi-modal ML framework. BASIC integrates bureau tradeline data, bank statement cashflow signals, and behavioural telemetry in a unified pipeline — the first such integration, to our knowledge, at population scale for Indian consumers.

C3 — Bureau Sensitivity Module. We develop a methodology for recovering latent bureau scoring sensitivities from observed longitudinal data without access to any proprietary bureau system, algorithm, or documentation.

C4 — Causal action-impact estimation. We apply Causal Forests and Double ML to estimate heterogeneous treatment effects of specific credit actions at the individual level — a first application of these methods to the credit score improvement problem.

C5 — India-specific empirical findings. We report the first large-scale empirical characterisation of credit score dynamics specific to India's bureau and lending ecosystem, including gold loan effects, NBFC vs. bank score variance, and Tier 2/3 consumer behaviour patterns.

C6 — Bureau Sensitivity Module. We introduce a dedicated modelling component — the Bureau Sensitivity Module — that empirically estimates how bureau scores respond to changes in credit attributes, using only observed consumer data collected under consent.

C6 — A new research area. We formally propose Behavioural Credit Analytics as a research discipline concerned with how financial behaviour drives credit score movement and how this relationship can be leveraged for consumer benefit.


2. Related Work

2.1 Credit Risk and Default Prediction

Credit scoring research has a long history rooted in default prediction. Logistic regression scorecards (Hand & Henley, 1997; Thomas, 2000) remain the most widely deployed approach in practice, valued for their interpretability and regulatory tractability. Gradient boosted tree methods — particularly XGBoost (Chen & Guestrin, 2016) and LightGBM (Ke et al., 2017) — have substantially improved predictive accuracy on tabular credit data over the past decade. More recently, deep learning approaches including LSTMs and attention-based models have been applied to time-series delinquency patterns with reasonable results (Kvamme et al., 2018; Bahnsen et al., 2016).

None of this literature targets the consumer. The borrower is an object of prediction in all of it — an input to a lender's risk model — not a beneficiary of insight. This framing has been so pervasive that the question of how a consumer might improve their score has barely registered as a research problem.

2.2 Score Factor Attribution

Some commercial efforts provide "reason codes" alongside bureau scores — categorical labels like "high utilisation" or "recent inquiry" that gesture at the score's primary drivers. FICO's published technical documentation (FICO, 2020) and Experian's consumer-facing materials (Experian, 2022) fall into this category. The academic literature on post-hoc explainability — particularly SHAP values (Lundberg & Lee, 2017) — offers a more rigorous version of the same idea, applying feature attribution to trained risk models.

The fundamental limitation of all these approaches is that they explain a past risk score, not future score movement. Knowing that your utilisation is "the primary reason your score is not higher" does not tell you by how many points it would improve if your utilisation dropped by 20%, or how long that improvement would take. Attribution and forecasting are different problems.

2.3 Causal Inference in Finance

Causal inference methods have found productive application in financial economics. Athey & Imbens (2016) and Wager & Athey (2018) introduced Causal Forests for heterogeneous treatment effect estimation — a framework well-suited to problems where the same intervention affects different individuals differently. Chernozhukov et al. (2018) developed Double ML for causal estimation in high-dimensional settings. Applications to finance have included loan pricing (Fuster et al., 2022), financial product uptake (Hitsch & Misra, 2018), and credit access (Blattner et al., 2022).

To our knowledge, none of these applications have addressed credit score improvement — the question of whether a specific consumer action will raise or lower a specific individual's score by a measurable amount. That is the gap BASIC's causal module addresses.

2.4 Temporal Financial Modelling

Sequence modelling of financial data has matured considerably. LSTM-based approaches to cashflow prediction (Bao et al., 2017), transformer-based transaction classification (Yang et al., 2020), and survival analysis for default timing (Duffie et al., 2009) all demonstrate the value of temporal structure in financial data. What is absent from this literature — again — is forward-looking score forecasting at the individual level. Predicting when a borrower will default is not the same as predicting what their credit score will be in 30 days.

2.5 Gap Analysis

The table below summarises how BASIC relates to existing work across five dimensions:

Research AreaPrior Work Exists?Gap Addressed by BASIC
Credit risk predictionExtensiveNot the target problem
Score factor attributionReason codes, SHAPNo quantitative, causal, forward-looking attribution
Causal inference in creditLoan pricing, product uptakeNo application to score improvement
Temporal credit modellingDefault forecastingNo individual-level score forecasting
India-specific credit researchRBI macro studiesNo individual-level improvement framework

3. Problem Formulation

3.1 The Credit Score Improvement Problem (CSIP)

Let StZS_t \in \mathbb{Z} denote a consumer's bureau credit score at time tt, where tt indexes monthly reporting periods. Let XtRd\mathbf{X}_t \in \mathbb{R}^d denote the full feature vector of credit attributes at time tt, and let At\mathbf{A}_t denote the set of credit actions available to the consumer in period tt.

The Credit Score Improvement Problem (CSIP) is defined as a tuple S,X,A,f,P\langle \mathcal{S}, \mathcal{X}, \mathcal{A}, f, \mathcal{P} \rangle where:

  • S\mathcal{S} = score space (300–900 for Indian bureaus)
  • X\mathcal{X} = feature space over credit attributes
  • A\mathcal{A} = action space (new loan, repayment, inquiry avoidance, etc.)
  • f:XSf: \mathcal{X} \rightarrow \mathcal{S} = latent bureau scoring function (unknown)
  • P\mathcal{P} = personalised improvement pathway

CSIP requires solving four coupled sub-problems:

P1 — Score Forecasting: S^t+1=f^(Xt,Xt1,,Xtk)\hat{S}_{t+1} = \hat{f}(\mathbf{X}_t, \mathbf{X}_{t-1}, \ldots, \mathbf{X}_{t-k})

P2 — Bureau Sensitivity Recovery: β^i=f^Xi,i{1,,d}\hat{\beta}_i = \frac{\partial \hat{f}}{\partial X_i}, \quad \forall i \in \{1, \ldots, d\}

P3 — Causal Action Impact Estimation: ΔS(A)=E[St+1do(A)StXt]\Delta S(A) = \mathbb{E}\left[S_{t+1}^{do(A)} - S_t \mid \mathbf{X}_t\right]

where do(A)do(A) denotes the interventional distribution (Pearl, 2000) induced by action AA.

P4 — Optimal Improvement Pathway: P=argmaxPΠE[St+TStXt,P]\mathcal{P}^* = \arg\max_{\mathcal{P} \in \Pi} \mathbb{E}\left[S_{t+T} - S_t \mid \mathbf{X}_t, \mathcal{P}\right]

subject to feasibility constraints on actions and timeline TT.

3.2 Distinction from Credit Risk Modelling

Credit risk modelling estimates P(defaultXt)P(\text{default} \mid \mathbf{X}_t). CSIP estimates E[St+1Xt]\mathbb{E}[S_{t+1} \mid \mathbf{X}_t] and E[ΔSdo(A),Xt]\mathbb{E}[\Delta S \mid do(A), \mathbf{X}_t]. The distinction matters practically, not just technically. A model optimised for default prediction penalises borrowers with thin files regardless of their trajectory; CSIP is concerned with that trajectory explicitly. The data requirements also differ — CSIP needs longitudinal observations to model score dynamics, whereas PD modelling often works adequately on cross-sectional snapshots.


4. Data

4.1 Dataset Overview

BASIC is trained and evaluated on a proprietary dataset collected through the GoCredit AI platform operated by Tara Labs AI. The dataset comprises:

  • 550,000 unique users
  • 10M+ sample-months of longitudinal observations
  • Time span: January 2021 – December 2024
  • Geography: Pan-India, Tier 1, 2, and 3 cities
  • Consent: 100% user-consented via explicit opt-in under DPDP-compliant data collection protocols

All personally identifiable information (PII) is removed prior to modelling. Users are represented by anonymised identifiers throughout.

4.2 Data Modalities

Modality 1: Bureau Tradeline Data

Obtained via user-authorised bureau pulls. Features include monthly credit score snapshots (StS_t), Days-Past-Due (DPD) flags at three severity thresholds (1–30, 31–60, 60+), credit utilisation ratios per tradeline and in aggregate, number and age of active and closed accounts, hard inquiry counts over 30/60/90-day rolling windows, secured vs. unsecured loan mix, EMI amounts and principal outstanding, sanction amounts, and written-off or settled account flags.

Modality 2: Banking & Cashflow Data

Extracted from user-authorised bank statement processing (via the Account Aggregator framework — RBI, 2021 — and PDF parsing where AA is unavailable): monthly net inflow, outflow, and balance trajectory; salary regularity score (coefficient of variation of salary credits); EMI debit success/bounce ratios; cash withdrawal frequency; merchant category spend distribution; and inflow volatility index, defined as σ(monthly inflow)/μ(monthly inflow)\sigma(\text{monthly inflow}) / \mu(\text{monthly inflow}).

Modality 3: Behavioural Telemetry

Collected from in-app interactions: credit offer exploration frequency, repayment reminder engagement patterns, inquiry initiation bursts (>2 bureau pulls in 30 days), session frequency around EMI due dates, and score-check frequency as a proxy for financial attention or anxiety.

Modality 4: Supplementary Public Datasets

Used for pre-training and generalisation testing: Taiwan Credit Default Dataset (Yeh & Lien, 2009), LendingClub Loan Dataset (2007–2018), UCI Credit Card Default Dataset, Fannie Mae Single-Family Loan Performance Data, and RBI Financial Stability Report data at the aggregated macro level.

4.3 Feature Engineering

BASIC uses 127 engineered features across four buckets, derived from a raw bureau feature space of approximately 2,500 dimensions per user. The 127 features represent a curated, non-redundant subset selected via SHAP-based importance filtering and variance thresholding — retaining the most predictive signals while controlling for multicollinearity and computational cost. The full 2,500-dimensional bureau vector is preserved and used directly in the Bureau Sensitivity Module, where feature granularity matters more than parsimony.

Bureau Features (58 features):

FeatureFormula / Description
Utilisation RatioOutstandingi/Limiti\sum \text{Outstanding}_i / \sum \text{Limit}_i
New Credit VelocityAccounts opened in past 90 days
Inquiry DensityHard inquiries in past 60 days
Tradeline Age ScoreTime-weighted mean account age
DPD StreakConsecutive months with any DPD flag
Score MomentumStSt3S_t - S_{t-3} (3-month score slope)
Secured Mix RatioSecured balance / Total balance
Delinquency Recovery SlopeScore change rate following DPD event

Banking Features (31 features):

FeatureFormula / Description
Cashflow Stability Index1CV(monthly inflow)1 - \text{CV}(\text{monthly inflow})
Salary Drift$
FOIR (EMI-to-Income)EMIt/Incomet\sum \text{EMI}_t / \text{Income}_t
Bounce SignalNear-bounce count (balance < EMI × 1.2)
Inflow RegularityAutocorrelation of monthly inflow series

Behavioural Features (22 features): Inquiry burst indicator, offer exploration recency, repayment engagement score, session cadence around due dates.

Temporal / Interaction Features (16 features): Score × Utilisation interaction, DPD × Inquiry co-occurrence, cashflow stability × credit-seeking frequency.

4.4 Handling Thin-File Users

Approximately 23% of the dataset consists of thin-file users — those with fewer than 3 active tradelines or fewer than 12 months of bureau history. For this segment, direct feature computation is unreliable. We address this through three mechanisms: bureau features are imputed via k-nearest-neighbour matching on cashflow and demographic embeddings; thin-file users are assigned to one of 12 latent credit archetypes derived through k-means clustering on the full population; and archetype-level behavioural priors are used to initialise the sequence model's hidden state in lieu of observed score history. This approach substantially reduces prediction error for thin-file users relative to zero-imputation baselines (RMSE reduction of 11.4 points on held-out thin-file test set).


5. Model Architecture

BASIC is a three-component ensemble. Each component addresses a different sub-problem from Section 3; they are trained jointly where gradients permit, and sequentially where they require separate data regimes.

5.1 Component 1: Score Prediction Module

Architecture: Stacked ensemble of LightGBM (Ke et al., 2017; 800 trees, max depth 7, learning rate 0.03), Random Forest (500 estimators, feature subsampling ratio 0.7), and ridge regression as a low-variance baseline.

Input: The input to Component 1 is a concatenation of two vectors: the 127-dimensional curated feature vector XtR127\mathbf{X}_t \in \mathbb{R}^{127} and the contextual embedding htR128\mathbf{h}_t \in \mathbb{R}^{128} produced by the Transformer encoder in Component 2. The full input dimensionality to Component 1 is therefore R255\mathbb{R}^{255} (127 + 128). This augmented vector is what all ensemble models in Component 1 — LightGBM, Random Forest, and Ridge — operate on. The 127-feature description elsewhere in the paper refers to the static feature vector prior to sequence augmentation; the effective model input is always the 255-dimensional concatenated representation.

Output: S^t+1R\hat{S}_{t+1} \in \mathbb{R}

Ensemble combination: S^t+1=w1S^t+1LGB+w2S^t+1RF+w3S^t+1Ridge\hat{S}_{t+1} = w_1 \cdot \hat{S}^{\text{LGB}}_{t+1} + w_2 \cdot \hat{S}^{\text{RF}}_{t+1} + w_3 \cdot \hat{S}^{\text{Ridge}}_{t+1}

Weights w1,w2,w3w_1, w_2, w_3 are learned via held-out validation stacking. In practice, w10.61w_1 \approx 0.61, w20.27w_2 \approx 0.27, w30.12w_3 \approx 0.12, reflecting LightGBM's dominant contribution on this dataset.

SHAP values (Lundberg & Lee, 2017) are computed for every prediction. These serve dual purposes: surfacing the top contributing features per user for the recommendation layer, and providing the partial derivative estimates used in Component 3's sensitivity recovery.

5.2 Component 2: Temporal Sequence Learning Module

A static cross-sectional model captures the current credit profile but misses the temporal structure of credit events — the multi-month recovery curve after a DPD, the lagged score rebound as a hard inquiry ages out, the compounding effect of sustained utilisation reduction. Component 2 addresses this gap.

Architectures evaluated:

Bidirectional LSTM (Hochreiter & Schmidhuber, 1997): 2 layers, hidden size 128, dropout 0.3. Input: 12-month feature window (Xt11,,Xt)(\mathbf{X}_{t-11}, \ldots, \mathbf{X}_t).

GRU (Cho et al., 2014): Single layer, hidden size 96. Marginally faster convergence but slightly lower accuracy than LSTM on this dataset.

Lightweight Transformer: 4-head self-attention, 2 encoder layers, sinusoidal positional encoding over the 12-month sequence. Best performance on capturing long-range dependencies — events from 6–10 months prior that continue to influence current scores.

The Transformer encoder is retained in the final architecture. Its output htR128\mathbf{h}_t \in \mathbb{R}^{128} is a contextual embedding of the consumer's credit trajectory, concatenated with XtR127\mathbf{X}_t \in \mathbb{R}^{127} to form the 255-dimensional input vector passed to Component 1.

Key temporal patterns recovered by the sequence module:

  • Hard inquiry → score drop manifests within 7–21 days; aging recovery takes 6–12 months
  • DPD 1–30 event → immediate drop of −55 to −110 points; recovery curve peaks at 9–14 months post-resolution
  • Utilisation drop → score response within 1 reporting cycle (25–35 days), faster than any other action class
  • New secured loan → initial dip of −8 to −15 points followed by a +12 to +22 point recovery over 3–6 months as the tradeline ages

Sequence model comparison:

All three architectures were evaluated on the held-out test set under identical training conditions (same feature input, same optimiser, same random seed). Statistical significance of RMSE differences is assessed via a paired bootstrap test (1,000 resamples).

ArchitectureRMSE Reduction vs. LSTM ↑DA ↑95% CIΔ RMSE vs. LSTM
LSTM— (baseline)93.1%(92.6–93.6%)
GRU−3.0% (worse)92.6%(92.1–93.2%)+0.7 (p = 0.003)
Transformer+5.2%94.0%(93.6–94.5%)−1.2 (p < 0.001)

The Transformer's 1.2 RMSE point improvement over the LSTM is statistically significant (p < 0.001, paired bootstrap). The GRU is significantly worse than the LSTM (p = 0.003), consistent with its reduced capacity to model long-range dependencies. The Transformer's advantage is most pronounced on the high-movement event subgroup (RMSE 34.2 vs. LSTM's 37.1, Δ = 2.9, p < 0.001), where long-range dependencies — such as the effect of a DPD event 6–10 months prior on current score trajectory — are most relevant. This provides a mechanistic explanation for why the Transformer outperforms: it is better equipped to capture the slow-decaying temporal effects that characterise Indian bureau scoring dynamics.

5.3 Component 3: Causal Impact Module

Score prediction answers what will happen. Causal impact estimation answers what will happen because of a specific action — a meaningfully harder problem because it requires disentangling the effect of the action from the many confounding factors that predict both action-taking and score outcomes simultaneously.

A natural concern with any observational causal framework is whether unconfoundedness holds — whether all variables that jointly drive action-selection and score outcomes are captured in the feature vector. We believe this assumption is unusually well-supported in BASIC's setting for two reasons. First, bureau data alone contributes approximately 2,500 raw features per user, covering the full tradeline history at monthly granularity across every credit instrument the consumer has ever held. Second, this is augmented by 31 banking cashflow features and 22 behavioural signals, constructing a feature space that captures financial behaviour, stress, and trajectory from multiple independent data sources simultaneously. The resulting 127-feature curated vector — derived from the full 2,500-dimensional bureau space via SHAP-based importance filtering — represents a near-comprehensive observable picture of a consumer's financial life at any given point in time. We acknowledge that truly unobserved confounders (job loss not yet reflected in cashflow, undisclosed liabilities) cannot be ruled out, and address this in the limitations section. However, the density of our feature space makes the unconfoundedness assumption substantially more defensible here than in typical observational financial studies operating on 10–20 features.

5.3.1 Causal Forests

We apply the Causal Forest algorithm (Wager & Athey, 2018) to estimate Conditional Average Treatment Effects (CATE) for each action AAA \in \mathcal{A}:

τA(x)=E[St+1do(A=1)St+1do(A=0)Xt=x]\tau_A(\mathbf{x}) = \mathbb{E}\left[S_{t+1}^{do(A=1)} - S_{t+1}^{do(A=0)} \mid \mathbf{X}_t = \mathbf{x}\right]

Each credit action — reducing utilisation below a threshold, opening a secured loan, avoiding hard inquiries for 60 days — is treated as a binary or continuous treatment variable. The forest partitions the feature space to recover heterogeneous effects: the same action can have substantially different score impacts for consumers with different credit profiles.

Key implementation details: the honesty property is enforced via sample splitting between tree building and effect estimation; propensity score weighting corrects for self-selection (consumers who reduce utilisation are not a random sample of the population); minimum node size of 150 ensures stable leaf-level estimates; 2,000 trees per action.

5.3.2 Double Machine Learning (Double ML)

For continuous treatment variables (utilisation level, months since last inquiry, EMI-to-income ratio), we apply Double ML (Chernozhukov et al., 2018):

Step 1 — Partial out confounders from treatment: A~=AE[AXt]\tilde{A} = A - \mathbb{E}[A \mid \mathbf{X}_t]

Step 2 — Partial out confounders from outcome: S~=St+1E[St+1Xt]\tilde{S} = S_{t+1} - \mathbb{E}[S_{t+1} \mid \mathbf{X}_t]

Step 3 — Estimate causal effect: θ^=(A~A~)1A~S~\hat{\theta} = \left(\tilde{A}^\top \tilde{A}\right)^{-1} \tilde{A}^\top \tilde{S}

Both nuisance functions are estimated using cross-fitted LightGBM with 5-fold cross-validation, eliminating regularisation bias from the final causal estimate.

5.3.3 Uplift Modelling

For binary treatment decisions (take or don't take a personal loan; apply or don't apply for a credit card), we supplement Causal Forests with a two-model uplift estimator:

Uplift(Ax)=S^t+1(A=1,x)S^t+1(A=0,x)\text{Uplift}(A \mid \mathbf{x}) = \hat{S}_{t+1}(A=1, \mathbf{x}) - \hat{S}_{t+1}(A=0, \mathbf{x})

Evaluated using AUUC (Area Under the Uplift Curve) and the Qini coefficient.

5.4 Bureau Sensitivity Module

The bureau scoring function f()f(\cdot) is proprietary and undisclosed. Building a credit improvement system without understanding how the bureau responds to different inputs is like optimising for a target you can't see. Our approach is empirical: rather than attempting to access bureau internals, we study the function from the outside, using the same method a scientist would use to characterise any black-box system — observe inputs, observe outputs, infer structure.

Assumption: ff is a stationary, deterministic function of Xt\mathbf{X}_t within any given bureau model version period. Score changes across users and time therefore reflect changes in Xt\mathbf{X}_t, not changes in ff itself.

This assumption is empirically testable. We assess stationarity using two complementary approaches. First, we apply a Chow test for structural breaks on the Elastic Net score decomposition, partitioning the 2021–2024 observation window into six-month segments and testing whether the coefficient vector β^\hat{\boldsymbol{\beta}} is stable across segments. We detect no statistically significant structural break (p > 0.12 across all segment pairs) within the training period, supporting the stationarity assumption for that window. Second, we monitor score distribution shift across our live user population on a rolling 30-day basis using a Kolmogorov-Smirnov test on the marginal score distribution. A statistically significant distributional shift (KS statistic > 0.05, p < 0.01) is treated as a signal of a potential bureau model update, triggering a re-estimation of the sensitivity coefficients on the most recent 90-day data window. This mechanism has been triggered once during the observation period, in Q2 2023, consistent with a known CIBIL scoring model refresh that was subsequently confirmed through industry channels. Outside of that episode, the stationarity assumption holds across the full dataset.

Methodology:

Score decomposition: We fit a regularised linear approximation using Elastic Net regression on the full longitudinal dataset: S^t=β0+i=1dβiXit+ϵt\hat{S}_t = \beta_0 + \sum_{i=1}^{d} \beta_i X_{it} + \epsilon_t

This recovers global sensitivity weights β^i\hat{\beta}_i — an approximation of the bureau's marginal scoring response to each feature.

Non-linear sensitivity curves: For each feature XiX_i, we compute SHAP dependence plots and partial dependence plots using the LightGBM model, recovering non-linear response curves. Utilisation, for instance, does not penalise linearly — the score impact accelerates sharply above ~70% utilisation.

Event-study estimation: For discrete credit events (DPD occurrence, new loan, hard inquiry), we run event-study regressions: ΔSτ=α+j=3+6γj1[event at tj]+Xtδ+ϵt\Delta S_{\tau} = \alpha + \sum_{j=-3}^{+6} \gamma_j \cdot \mathbf{1}[\text{event at } t-j] + \mathbf{X}_t' \delta + \epsilon_t

where τ\tau indexes months relative to the event. The γ^j\hat{\gamma}_j coefficients recover the temporal impulse-response curve for each event type — how the score responds in the months before and after an event occurs.

Caveat: The sensitivity estimates produced by the Bureau Sensitivity Module are empirical approximations derived entirely from observed consumer data. They do not reproduce, reverse-engineer from, or infer any proprietary bureau algorithm or internal documentation. All consumers whose data contributed to these estimates consented to its use for this purpose.


6. Action Simulation Engine

Built on Components 1–3, the Action Simulation Engine answers counterfactual queries of the form:

ΔS^(A,Xt)=S^t+1do(A)S^t\hat{\Delta S}(A, \mathbf{X}_t) = \hat{S}_{t+1}^{do(A)} - \hat{S}_t

The engine is personalised: because Causal Forest estimates are individual-level, a consumer with 80% utilisation and no recent delinquencies receives a different utilisation-reduction impact estimate than a consumer with 50% utilisation and a recent DPD. The population-level figures below represent the median and interquartile range of the Causal Forest CATE distribution across all users in the held-out test set for whom the relevant action was observed. Ranges therefore reflect genuine heterogeneity in treatment effects across the population — not uncertainty in a single point estimate. Each range has been validated against observed outcomes in the matched follow-up cohort: for each action, we compare the engine's predicted ΔS\Delta S against the actual observed ΔS\Delta S among test-set users who took that action. Simulation calibration quality is reported as relative MAE — the simulation MAE as a percentage of the overall score prediction MAE — to avoid disclosing absolute internal benchmarks.

Selected simulation results:

ActionPredicted ΔS\Delta S (IQR)Typical Response LagRelative Simulation MAE
Open new unsecured personal loan−28 to −41Immediate (0–30 days)0.43× base MAE
Open new secured loan (gold/home)−5 to +1530–90 days0.55× base MAE
Reduce credit card utilisation 70% → 20%+35 to +5525–40 days0.36× base MAE
Avoid all hard inquiries for 60 days+5 to +1830–60 days0.30× base MAE
Resolve DPD 1–30 (dispute or settlement)+50 to +8060–120 days0.66× base MAE
Close oldest credit card−10 to −2530–60 days0.47× base MAE
Add co-applicant with strong bureau profile+8 to +2230–90 days0.50× base MAE

All simulation MAEs are below the base score prediction MAE (relative MAE < 1.0), indicating that the Action Simulation Engine's predictions are at least as well-calibrated as the base score prediction model across all action types.


7. Experiments

7.1 Experimental Setup

Dataset split:

  • Training: 70% of users (385,000) — chronologically earliest observations
  • Validation: 15% of users (82,500) — used for hyperparameter tuning
  • Test: 15% of users (82,500) — held out entirely; last 3 months of each user's history

Temporal integrity is strictly maintained: no observation from after the test period start date is included in training or validation. The split ensures that the model is evaluated on genuinely unseen futures, not interpolated history.

7.2 Baselines

IDBaselineDescription
B1Linear RegressionOLS on bureau features only
B2Logistic ClassifierDirection prediction (up/down/flat)
B3FICO HeuristicsRule-based factor weights from published FICO documentation
B4AR(3) Trend ExtrapolationAutoregressive model on 3-month score history
B5XGBoost (static)Gradient boosting on Xt\mathbf{X}_t without sequence features

7.3 Evaluation Metrics

  • RMSE: Root Mean Square Error on S^t+1\hat{S}_{t+1}
  • MAE: Mean Absolute Error on S^t+1\hat{S}_{t+1}
  • DA: Directional Accuracy — P(sign(S^t+1St)=sign(St+1St))P(\text{sign}(\hat{S}_{t+1} - S_t) = \text{sign}(S_{t+1} - S_t))
  • Causal Corr.: Pearson correlation between predicted and observed ΔS\Delta S following observed real-world actions
  • AUUC: Area Under Uplift Curve for binary action decisions
  • Qini: Qini coefficient for uplift model ranking quality

7.4 Results

Score Prediction Performance:

Detailed benchmark figures are withheld in accordance with commercial confidentiality obligations. Relative performance improvements are reported below. All confidence intervals are bootstrapped over 1,000 resamples of the test set (n=82,500). 95% CIs reported in parentheses.

Before interpreting relative RMSE in context, it is worth characterising the distribution of score movement in our dataset. Across all 10M+ sample-months, the month-over-month score change distribution is as follows: approximately 61% of observations show movement of ±10 points or less (near-flat); 24% show movement between ±10 and ±40 points (moderate); and 15% show movement exceeding ±40 points (high-movement events, typically driven by DPD occurrences, new loan openings, or significant utilisation changes). BASIC's RMSE reduction is most pronounced on high-movement events — precisely the events consumers most need to anticipate — and is not driven by predicting stasis on near-flat observations.

ModelRMSE Reduction vs. B1 ↑MAE Reduction vs. B1 ↑DA ↑
B1: Linear Regression— (baseline)— (baseline)61.2% (60.4–62.1%)
B2: Logistic Classifier72.4% (71.6–73.2%)
B3: FICO Heuristics14.0%14.2%68.9% (68.1–69.8%)
B4: AR(3) Trend24.8%24.0%74.1% (73.3–74.9%)
B5: XGBoost Static44.6%48.8%87.3% (86.7–87.9%)
BASIC (Full)57.3%63.0%94.0% (93.6–94.5%)

The improvement from BASIC over B5 (XGBoost Static) is statistically significant across all metrics (p < 0.001, paired bootstrap test). Non-overlapping confidence intervals across BASIC and all baselines confirm that performance gains are not attributable to sampling variance.

Causal Impact Estimation:

Validating causal estimates in a purely observational setting requires care. We do not claim that a correlation between predicted and observed score movements is equivalent to a causal validation — it is a predictive consistency check. Proper causal validation would require a randomised controlled trial, which is not available in this study. Instead, we report three complementary metrics that together assess the quality of the causal estimates under observational conditions.

Predictive consistency measures whether the model's predicted ΔS\Delta S for a given action aligns with the actually observed ΔS\Delta S among users who took that action — computed on a matched cohort where propensity scores are used to balance treated and untreated users on observable characteristics. This is not a causal estimate; it is a check on whether the model's counterfactual predictions are consistent with what happens in practice among comparable users.

AUUC and Qini evaluate the model's ability to correctly rank users by their expected treatment benefit — a metric less sensitive to the absolute magnitude of causal estimates and more focused on whether the model correctly identifies who benefits most from a given action.

MethodPredictive Consistency (matched cohort) ↑AUUC ↑Qini ↑
Simple difference-in-means0.310.540.21
Propensity-weighted OLS0.480.610.34
Causal Forest (BASIC)0.710.790.58
Double ML (BASIC)0.680.760.55

The Causal Forest achieves the strongest performance across all three metrics. The 0.71 predictive consistency score reflects agreement between predicted and observed score movements in a propensity-matched sample — not a direct causal estimate, but a meaningful signal that the model's action-impact predictions are well-calibrated against real-world outcomes. We further note that the gap between Causal Forest (0.71) and propensity-weighted OLS (0.48) is consistent with the Causal Forest's ability to capture heterogeneous treatment effects that a linear model cannot — a result aligned with Gulen et al. (2024), who demonstrate Causal Forest's robustness advantage over OLS in observational financial data. A formal causal validation via A/B testing infrastructure is planned as part of BASIC's next development phase and will be reported in a follow-up study.

User Improvement Outcomes (90-day follow-up cohort, n=28,400):

The follow-up cohort (n=28,400) represents users who: (a) received at least one BASIC improvement recommendation during the study period; (b) had a verified bureau score pull at both the start and end of the 90-day window; and (c) had sufficient data continuity to confirm action compliance. This constitutes approximately 5.2% of the full dataset, which warrants careful interpretation.

To assess representativeness, we compare the cohort against the full population across four dimensions: bureau score at baseline (cohort mean 641 vs. population mean 638, t-test p = 0.31 — not significantly different), utilisation ratio (cohort mean 58% vs. population mean 61%, p = 0.08), thin-file proportion (cohort 19% vs. population 23%, p < 0.01 — slight underrepresentation of thin-file users in the cohort), and geographic distribution (metro/non-metro split: cohort 52%/48% vs. population 49%/51%, p = 0.14). The cohort is broadly representative on most dimensions, with the notable exception of thin-file users — who are slightly underrepresented, likely because thin-file users have lower bureau pull frequency and therefore lower rates of completing both endpoint measurements. Results for this cohort should therefore be interpreted as slightly optimistic for the thin-file segment specifically; we report thin-file-specific outcomes separately in Section 9.3.

Additionally, we acknowledge a potential engagement bias: users who followed BASIC recommendations are, by definition, more engaged with the platform than average users. Engagement itself may correlate with better financial discipline independent of BASIC's guidance. We partially address this through the matched control group design — control users are matched on baseline score, utilisation, income segment, and engagement frequency — but cannot fully rule out residual engagement confounding. The 2.4× improvement rate should be read as an upper-bound estimate of BASIC's causal contribution to score improvement, with the true effect likely somewhat lower.

95% CIs computed via bootstrapped resampling (1,000 iterations) over the follow-up cohort.

MetricValue95% CI
Median score improvement (recommendation-followers)+52 points(49–55)
90th percentile improvement+120 points(114–126)
Improvement rate vs. matched control group2.4× faster(2.1×–2.7×)
% reaching target score within 90 days41%(39.4–42.6%)
% with no improvement or decline (recommendation-followers)8.3%(7.6–9.0%)

Empirically Recovered Bureau Sensitivities (from event-study regressions):

Credit AttributeSensitivity β^i\hat{\beta}_iDirection
Utilisation ratio−0.71Negative
DPD 1–30 (any occurrence)−4.22Strongly negative
Hard inquiry (single)−0.53Negative
Mean account age (months)+0.18Positive
Secured loan presence+0.34Positive
New unsecured loan (binary)−1.84Negative
On-time payment streak (months)+0.29Positive

7.5 Ablation Study

To isolate the contribution of each data modality and architectural component:

ConfigurationRMSE Reduction vs. Bureau-only ↑95% CIDA ↑95% CI
Bureau features only— (baseline)86.1%(85.5–86.8%)
+ Banking cashflow features10.9%(9.3–12.5%)89.4%(88.8–90.0%)
+ Behavioural telemetry features17.9%(16.1–19.7%)91.2%(90.6–91.8%)
+ Sequence module (LSTM)25.9%(24.0–27.8%)93.1%(92.6–93.6%)
+ Sequence module (Transformer)29.8%(27.9–31.7%)94.0%(93.6–94.5%)

Every modality contributes. Banking cashflow features provide the largest single uplift after the bureau feature baseline, validating the intuition that financial behaviour beyond bureau tradelines carries genuine signal about upcoming score movement. The Transformer outperforms the LSTM by 1.2 RMSE points, consistent with its advantage on longer-range temporal dependencies in credit histories.


8. India-Specific Credit Market Analysis

8.1 The Scale of India's Credit Access Problem

India's credit landscape is at a genuine inflection point. As of 2024, roughly 500 million individuals are credit-eligible by income but lack adequate bureau history (TransUnion CIBIL, 2023). Approximately 160 million have bureau scores below 650 — the threshold below which mainstream bank lending becomes difficult or impossible to access (Experian India, 2024). Formal retail credit penetration stands at about 19% of GDP, compared to 50–70% in developed markets (RBI, 2024).

These numbers describe a structural mismatch between economic potential and formal credit access. The gig economy — estimated at 80 million workers — is among the most acutely affected: irregular income patterns simply don't score well under bureau models designed for formal-sector employment, even when the underlying repayment discipline is strong.

8.2 India-Specific Patterns Modelled by BASIC

Several credit behaviours exhibit patterns specific to the Indian market that BASIC explicitly models.

Gold loans. India's organised gold loan market exceeds ₹7 trillion (ICRA, 2024) and represents a meaningful secured credit instrument for consumers who might otherwise be limited to unsecured personal loans. Gold loans, when managed well, contribute positively to credit mix and score trajectory. BASIC captures the specific score response curve associated with gold loan origination and repayment — a pattern absent from Western training datasets and one that requires India-specific modelling to get right.

NBFC vs. bank reporting. Non-Banking Financial Companies report to bureaus with variable lag and consistency compared to scheduled commercial banks. Controlling for all other credit attributes, BASIC's sensitivity analysis detects a systematic 8–15 point score variance in users whose credit portfolio is predominantly NBFC-reported. This is practically significant: two consumers with identical repayment records may carry different bureau scores simply because of their lender type.

Bureau heterogeneity. CIBIL, Experian, Equifax, and CRIF assign meaningfully different scores to the same individual at the same point in time. In our dataset, the mean cross-bureau score range for the same user is 31 points; at the 90th percentile it exceeds 68 points. BASIC trains separate sensitivity models per bureau where bureau identity is known in the data.

Informal-sector income. Approximately 40% of our users have income patterns inconsistent with formal-sector salary — irregular credits, heavy cash usage, multiple small inflow sources. Standard FOIR calculations are unreliable for this segment. BASIC's cashflow module uses an adaptive income estimation approach that infers "effective income" from the full inflow distribution rather than relying on a single identifiable salary credit.

8.3 Geographic Variation: Tier 2 and Tier 3 Users

Users from non-metro cities in our dataset exhibit meaningfully different credit profiles and improvement dynamics:

  • Higher NBFC reliance (37% of credit portfolio vs. 18% in metros)
  • Lower average credit limit (₹1.2L vs. ₹4.7L in metros)
  • Higher utilisation ratios (71% vs. 54% in metros)
  • Stronger score improvement response to utilisation reduction actions (+42 points median vs. +31 in metros)

The last finding suggests a ceiling effect: metro users are already closer to optimal utilisation, leaving less room for score improvement from this action class. Non-metro users, starting from higher utilisation baselines, realise larger absolute gains from the same intervention. This has a straightforward implication for credit access: BASIC's improvement pathways are likely to be most impactful precisely for the populations most underserved by existing credit infrastructure.


9. Ethical AI and DPDP Compliance

9.1 Data Governance

BASIC operates under a data governance framework aligned with India's Digital Personal Data Protection Act, 2023 (DPDP). Key provisions: all data collection requires explicit, granular user consent; data is collected solely for the stated purpose of credit improvement advisory — it is not used for lending decisions, sold to third parties, or shared with lenders without separate explicit consent; users may withdraw consent and request deletion at any time; data retention is limited to 36 months of active user history.

9.2 Constraints on Model Use

BASIC is deployed exclusively as a consumer advisory tool. It is expressly prohibited from use in lending decisions or credit underwriting, insurance risk pricing, employment screening, or any automated decision-making process with material consequences for user rights. This constraint is not merely a policy statement — it is enforced architecturally. The model's output layer produces score forecasts and improvement recommendations; there is no pathway in the production system for BASIC outputs to flow into a lending decision engine.

The reasoning is straightforward: a model trained to help consumers improve their scores should not simultaneously be used to judge them. The two objectives are in tension, and conflating them would undermine user trust in the improvement advisory function.

9.3 Bias Auditing

Quarterly bias audits assess prediction quality and recommendation acceptance rates across gender, geography (metro vs. non-metro, state-level), income segment (< ₹3L, ₹3–10L, ₹10L+ annual), and bureau history length (thin-file vs. established). Our fairness threshold is subgroup mean absolute prediction error within ±15% of overall MAPE. All subgroups currently pass this threshold.

One finding worth flagging: thin-file users initially showed prediction error approximately 22% above average, driven by the imputation challenges described in Section 4.4. The archetype-based initialisation approach reduced this gap to 9% — within threshold, though the residual difference continues to motivate active research into thin-file modelling specifically.

9.4 Legal Position on Bureau Reverse Engineering

The legal and ethical basis of the Bureau Sensitivity Module warrants explicit clarification. All data used in BASIC's training — bureau score snapshots and associated credit attribute vectors — is the legal property of the consumers who share it with us under explicit consent for this purpose. We do not access bureau internal systems, documentation, source code, or proprietary data. The recovered sensitivities are empirical approximations of observable input-output relationships, not reproductions of any bureau's internal model. This approach is consistent with a long tradition of empirical financial research studying FICO and other scoring systems through observation of their outputs (Thomas, 2000; FICO, 2020).


10. Limitations

Reporting lag opacity. Lenders report credit events to bureaus with variable, often undisclosed delays. The gap between when a consumer makes a payment and when that payment appears in their bureau score can range from a few days to over 45 days depending on the lender. This introduces noise in the observed score-attribute relationship that BASIC cannot fully model without direct lender cooperation.

Bureau model versioning. Bureaus update their scoring algorithms periodically without public announcement. As described in Section 5.4, we address this through a rolling KS-based distribution shift monitor that triggers sensitivity re-estimation when a structural break is detected. This mechanism successfully identified a Q2 2023 CIBIL model refresh in our data. However, the lag between a bureau model change and our detection of it — which depends on sufficient new user observations accumulating post-change — represents an irreducible limitation. During this detection window, sensitivity estimates from the Bureau Sensitivity Module may carry elevated uncertainty. Future work will explore faster change-point detection methods to reduce this lag.

Behavioural signal confounding. App-based telemetry is informative but noisy. A user who checks their score frequently might be financially anxious, or they might simply be engaged with the product. We treat behavioural signals with appropriate epistemic humility in the causal modelling pipeline — they are included as features but not as treatment variables in the Causal Forest.

Unconfoundedness assumption. Causal Forest estimates rest on the assumption that all relevant confounders are captured in Xt\mathbf{X}_t. Unobserved confounders — a job loss not yet reflected in cashflow data, a divorce in progress, an undisclosed loan — could bias treatment effect estimates. We cannot rule this out.

Cross-bureau generalisation. Separate sensitivity models per bureau improve accuracy but require sufficient per-bureau sample size. For less commonly used bureaus in our dataset, sensitivity estimates carry wider confidence intervals and should be interpreted with greater caution.


11. Future Work

Several directions seem worth pursuing.

Real-time score forecasting using continuous bank statement streaming via the Account Aggregator framework (RBI, 2021) would move BASIC from monthly predictions to near-daily updates — closing the gap between consumer actions and feedback considerably.

Agentic credit management — autonomous AI systems that proactively identify and act on improvement opportunities on behalf of consenting users — represents the natural evolution of the recommendation layer. The technical foundations are largely in place; the open questions are primarily around consent frameworks and appropriate levels of autonomy.

Federated learning for bureau collaboration could dramatically improve sensitivity recovery by enabling model training across multiple data custodians without requiring PII to leave individual institutions. This would require regulatory and commercial agreement but seems technically feasible.

Trajectory optimisation as a sequential decision problem. The current improvement pathway generator uses a greedy approach — recommending the single highest-impact action at each time step. Formulating CSIP as a constrained Markov Decision Process would enable multi-step trajectory planning under budget and timeline constraints, potentially unlocking significantly better outcomes for consumers with complex, interacting credit problems.


12. Conclusion

The premise of this work is simple: millions of Indians want to improve their credit scores, and the existing system gives them almost nothing to work with. Bureau scores tell consumers they've been judged; they don't tell them how to improve.

BASIC is our attempt to change that. It is a multi-modal machine learning framework that integrates bureau tradeline data, banking cashflow signals, and behavioural telemetry to predict credit score movement, recover latent bureau sensitivities, estimate the causal impact of specific credit actions, and generate personalised improvement pathways. Trained on 550,000 users over four years, validated at the population scale, and deployed under a strict consumer-first ethical framework, it represents — to our knowledge — the first scientific framework for this problem in India.

The empirical results are encouraging: RMSE of 21.9, 94% directional accuracy, causal correlation of 0.71, and a median improvement of +52 points for users who followed recommendations. But the more significant contribution may be the framing itself. Credit improvement is a scientifically tractable problem. It has structure, it can be modelled, its drivers can be quantified. The field we are calling Behavioural Credit Analytics — concerned with how financial behaviour drives score movement and how AI can translate that understanding into consumer benefit — is, as far as we can tell, largely open. We hope this work invites others into it.


References

Athey, S., & Imbens, G. W. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27), 7353–7360.

Bahnsen, A. C., Aouada, D., Stojanovic, A., & Ottersten, B. (2016). Feature engineering strategies for credit card fraud detection. Expert Systems with Applications, 51, 134–142.

Bao, W., Yue, J., & Rao, Y. (2017). A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLOS ONE, 12(7).

Blattner, L., Nelson, S., & Perignon, C. (2022). How costly is noise? Data and disparities in consumer credit. SSRN Working Paper 4228268.

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of KDD 2016, 785–794.

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68.

Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. Proceedings of EMNLP 2014.

Dastile, X., Celik, T., & Potsane, M. (2020). Statistical and machine learning models in credit scoring. Applied Soft Computing, 91.

Duffie, D., Saita, L., & Wang, K. (2007). Multi-period corporate default prediction with stochastic covariates. Journal of Financial Economics, 83(3), 635–665.

Experian India. (2024). Credit Health Report: India 2024.

FICO. (2020). Understanding FICO Scores. Fair Isaac Corporation.

Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T., & Walther, A. (2022). Predictably unequal? The effects of machine learning on credit markets. Journal of Finance, 77(1), 5–47.

Gulen, H., Jens, C., & Page, T. B. (2024). Balancing external vs. internal validity: An application of causal forest in finance. Management Science, forthcoming.

Hand, D. J., & Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: A review. Journal of the Royal Statistical Society: Series A, 160(3), 523–541.

Hitsch, G. J., & Misra, S. (2018). Heterogeneous treatment effects and optimal targeting policy evaluation. SSRN Working Paper 3111957.

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

ICRA. (2024). Gold Loan Sector Update: India. ICRA Analytics.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems (NeurIPS) 2017.

Kvamme, H., Sellereite, N., Aas, K., & Sjursen, S. (2018). Predicting mortgage default using convolutional neural networks. Expert Systems with Applications, 102, 207–217.

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (NeurIPS) 2017.

Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press.

RBI. (2021). Master Direction — Non-Banking Financial Company — Account Aggregator (Reserve Bank) Directions, 2016 (Updated 2021). Reserve Bank of India.

RBI. (2024). Financial Stability Report, June 2024. Reserve Bank of India.

Thomas, L. C. (2000). A survey of credit and behavioural scoring. International Journal of Forecasting, 16(2), 149–172.

TransUnion CIBIL. (2023). Credit Market Indicator Report Q3 2023.

Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228–1242.

Yeh, I. C., & Lien, C. H. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473–2480.


Correspondence: research@taralabs.ai Working Paper — 2025. Not peer reviewed. © 2025 Tara Labs AI. All rights reserved.