BASIC: Behavioural Analytics for Score Improvement in Credit

A Gradient Boosting Framework for Credit Score Prediction, Bureau Sensitivity Estimation, and Personalised Improvement Pathways in India

Tara Labs AI Research Team Shubham Arawkar · Pranav Murali · Anupam Acharya

Tara Labs AI Research Team, Bengaluru, India

With advisory contributions from Mr Abhijit† †Guest Research Advisor

research@taralabs.ai

Working Paper — 2025 © Tara Labs AI. All rights reserved.

Abstract

Bureau credit scores in India — issued by CIBIL, Experian, Equifax, and CRIF — function as the primary gatekeeping mechanism for consumer access to formal lending. Yet these scores are retrospective, opaque, and calibrated entirely for lender-side default risk: they render a verdict on the consumer's past without offering any forward guidance on their financial future. We introduce BASIC (Behavioural Analytics for Score Improvement in Credit), a gradient boosting framework built on a fundamentally different premise — that the right question to ask about a consumer's credit profile is not "how likely are they to default?" but "how do we help them improve?"

BASIC operates on bureau tradeline data alone (Experian, Equifax, CIBIL), and achieves strong predictive performance without requiring additional data modalities. The core contribution is a differential feature engineering approach: rather than modelling a credit profile as a static snapshot, BASIC constructs a representation that pairs absolute-level features with period-over-period changes in credit attributes, giving gradient boosting models both the current state and the rate of change across each credit dimension. The model is trained and evaluated on a user-consented dataset spanning several hundred thousand individuals and over a million bureau profile snapshots. Our best configuration — a CatBoost Regressor on curated engineered features — achieves R²=0.851, RMSE=21.41, MAE=15.67, and MAPE=2.18% on month-ahead credit score prediction. A high-dimensional raw-feature configuration trained on GPU achieves comparable but slightly lower performance, demonstrating that feature quality dominates feature quantity in this setting. Built on this prediction foundation, BASIC includes a Bureau Sensitivity Module that estimates how individual credit attributes affect bureau scores through systematic feature perturbation, and a Boost Action Simulation Engine that predicts the score impact of a catalogue of credit actions (loan openings, account closures, payment history changes, enquiry avoidance) personalised to each consumer's current profile. Score-bucketed models, trained separately across segments of the bureau score distribution, further refine sensitivity estimation within each segment.

To our knowledge, this is the first scientific, population-scale framework for credit score improvement modelling in India. We propose the broader research area this work opens as Behavioural Credit Analytics — a discipline concerned with how financial behaviour drives score movement and how AI systems can translate that understanding into consumer empowerment.

Keywords: credit scoring, gradient boosting, CatBoost, LightGBM, differential features, bureau sensitivity estimation, feature perturbation, financial inclusion, India, behavioural credit analytics, CSIP

1. Introduction

Walk into most Indian banks today and ask a loan officer what you can do to improve your credit score. Odds are, you'll get some version of the same answer: pay on time, keep your credit utilisation low, don't apply for too many loans at once. Useful advice, approximately. But almost completely unquantified. Nobody tells you how much your utilisation needs to drop, or by when, or what the actual penalty was for that missed EMI two years ago.

This gap — between the enormous influence credit scores exert and the near-total absence of scientific guidance on improving them — is the central motivation for this work.

Credit scores issued by India's four major bureaus (CIBIL, Experian, Equifax, CRIF) are, at their core, probability-of-default (PD) models. They estimate the likelihood that a borrower will fail to repay, and they are calibrated to serve lenders, not consumers. The retail credit market they govern is substantial — over ₹22 trillion in disbursements in FY2024 (RBI, 2024) — and growing rapidly as digital lending, BNPL products, and co-branded credit cards pull millions of first-time borrowers into the formal financial system.

The trouble is that the infrastructure guiding these new borrowers has not kept pace. A scoring model engineered for steady-state credit populations in the early 2000s is a blunt instrument for a population that is overwhelmingly composed of first-generation credit users: gig workers with irregular income, self-employed individuals, salaried employees in Tier 2 and 3 cities whose financial behaviour doesn't map cleanly onto models trained on metro, formal-sector data. Over 300 million Indians sit in a near-prime or credit-invisible segment (TransUnion CIBIL, 2023) — not chronic defaulters, but people the system was never designed to help.

We propose BASIC as a response to this problem. BASIC is not a risk model. It is a credit improvement framework, designed to answer the questions that actual consumers ask:

  • What will my credit score be next month?
  • What is the bureau penalising me for, and how much does each factor cost me?
  • If I pay off this account, take this loan, or wait out this inquiry — what actually happens to my score?
  • What is the fastest realistic path to a score that unlocks better credit products?

These questions require regression over credit trajectories, sensitivity estimation over bureau feature spaces, and the ability to simulate the score consequences of prospective actions. BASIC addresses all three using gradient boosting on bureau data — demonstrating that bureau tradeline data alone, engineered thoughtfully, is sufficient to build a practically useful credit improvement system.

1.1 Contributions

C1 — A novel problem formulation. We formally define the Credit Score Improvement Problem (CSIP), distinguishing it from credit risk prediction as a distinct inference task with different data requirements, model objectives, and evaluation criteria.

C2 — A gradient boosting framework with differential feature engineering. BASIC trains CatBoost and LightGBM regressors on a combined representation of absolute-level bureau features and period-over-period differential features on bureau data from Experian, Equifax, and CIBIL. We demonstrate that this representation achieves high predictive accuracy (R²=0.851) without requiring banking cashflow data or behavioural telemetry.

C3 — Bureau Sensitivity Module via feature perturbation. We develop a methodology for recovering latent bureau scoring sensitivities from observed data by applying structured perturbations to input features and measuring the model's predicted score response. This empirical approach estimates marginal sensitivity for each credit attribute without any access to proprietary bureau systems, algorithms, or documentation.

C4 — Boost Action Simulation Engine for personalised pathways. We build a simulation engine that applies differential feature changes corresponding to real-world credit actions and predicts the resulting score delta. A catalogue of credit actions is modelled, personalised to each consumer's current credit profile via score-bucketed models.

C5 — India-specific empirical findings. We report large-scale empirical characterisation of credit score dynamics specific to India's bureau ecosystem, including the effects of gold loans, NBFC accounts, secured vs. unsecured credit mix, and Tier 2/3 consumer bureau profiles.

C6 — A new research area. We formally propose Behavioural Credit Analytics as a research discipline concerned with how financial behaviour drives credit score movement and how this relationship can be leveraged for consumer benefit.

2.1 Credit Risk and Default Prediction

Credit scoring research has a long history rooted in default prediction. Logistic regression scorecards (Hand & Henley, 1997; Thomas, 2000) remain the most widely deployed approach in practice. Gradient boosted tree methods — XGBoost (Chen & Guestrin, 2016) and LightGBM (Ke et al., 2017) — have substantially improved predictive accuracy on tabular credit data, consistently outperforming neural approaches on structured financial features. Deep learning methods have been applied to delinquency pattern sequences (Kvamme et al., 2018), though tree-based models remain the practitioner standard for bureau-derived tabular data.

None of this literature targets the consumer. The borrower is an object of prediction in all of it — an input to a lender's risk model — not a beneficiary of insight.

2.2 Score Factor Attribution

Commercial efforts provide "reason codes" alongside bureau scores — categorical labels that gesture at score drivers. FICO's published technical documentation (FICO, 2020) and Experian's consumer-facing materials (Experian, 2022) fall into this category. SHAP values (Lundberg & Lee, 2017) offer a more rigorous version: post-hoc attribution applied to trained models.

The fundamental limitation of all these approaches is that they explain a past risk score, not future score movement. Attribution and forward simulation are different problems.

2.3 Gradient Boosting and Feature Perturbation for Sensitivity Analysis

CatBoost (Prokhorenkova et al., 2018) introduced ordered boosting to reduce prediction shift bias on categorical features, making it particularly well-suited to bureau data where loan type categories are prevalent. LightGBM (Ke et al., 2017) achieves efficiency through histogram-based splitting and leaf-wise tree growth. Both substantially outperform classical ensemble methods on structured financial data.

Monotone constraints in gradient boosting enforce known directionality relationships between features and targets (Chen et al., 2019; Groeneboer & Thomas, 2020). In the credit improvement context, monotonicity constraints encode credit domain knowledge: more on-time payments should not decrease a predicted score; more delinquency events should not increase it. BASIC applies domain-informed monotone constraints to a subset of features where the directional relationship with the bureau score is unambiguous.

Feature perturbation as a method for sensitivity analysis has a long history. Breiman (2001) introduced permutation importance; Lundberg & Lee (2017) formalised SHAP. For prospective sensitivity estimation — estimating how much a score would change in response to a specific change in a credit attribute — targeted perturbation of individual input features is more direct than post-hoc attribution: it estimates a marginal response rather than a retrospective contribution.

2.4 Counterfactual Simulation via Feature Perturbation

Counterfactual reasoning — "what would the outcome have been if input xx had been xx'?" — is a natural framework for credit improvement guidance. Wachter et al. (2017) formalised counterfactual explanations for black-box classifiers, proposing minimal-change perturbations that flip a model's prediction. Ustun et al. (2019) extended this to actionable recourse, identifying feature changes that are realistic for the individual. Karimi et al. (2020) surveyed the algorithmic recourse literature, noting the tension between proximity (small changes) and feasibility (changes the consumer can actually make).

BASIC's Boost Action Simulation Engine sits within this tradition with domain-specific adaptations. Rather than searching for minimal counterfactuals, we simulate the score consequences of a predefined catalogue of credit actions — opening a gold loan, reducing credit card utilisation, closing a delinquent account — that correspond to real actions available to Indian consumers. Differential feature vectors encode each action as a structured change to the input space; the trained CatBoost regressor predicts the resulting score delta. Score-bucketed models ensure calibration to the consumer's current position in the score distribution, where bureau scoring dynamics differ substantially.

2.5 Gap Analysis

Research AreaPrior Work Exists?Gap Addressed by BASIC
Credit risk predictionExtensiveNot the target problem
Score factor attributionReason codes, SHAPNo quantitative, forward-looking, action-level attribution
Gradient boosting on bureau dataXGBoost, LightGBM, CatBoostNo application to score improvement simulation
Counterfactual/recourse modellingWachter et al., Ustun et al.No India-specific credit action catalogue or bureau feature perturbation approach
India-specific credit researchRBI macro studiesNo individual-level improvement framework on Indian bureau data

3. Problem Formulation

3.1 The Credit Score Improvement Problem (CSIP)

Let StZS_t \in \mathbb{Z} denote a consumer's bureau credit score at time tt, where tt indexes monthly reporting periods. Let XtRd\mathbf{X}_t \in \mathbb{R}^d denote the full feature vector of credit attributes at time tt, and let At\mathbf{A}_t denote the set of credit actions available to the consumer in period tt.

The Credit Score Improvement Problem (CSIP) is defined as a tuple S,X,A,f,P\langle \mathcal{S}, \mathcal{X}, \mathcal{A}, f, \mathcal{P} \rangle where:

  • S\mathcal{S} = score space (300–900 for Indian bureaus)
  • X\mathcal{X} = feature space over credit attributes
  • A\mathcal{A} = action space (new loan, repayment, account closure, inquiry avoidance, etc.)
  • f:XSf: \mathcal{X} \rightarrow \mathcal{S} = latent bureau scoring function (unknown and proprietary)
  • P\mathcal{P} = personalised improvement pathway

BASIC addresses three coupled sub-problems:

P1 — Score Prediction:

We learn a function f^\hat{f} that approximates the latent bureau scoring function from observed data:

S^t+1=f^(Xt,ΔXt)\hat{S}_{t+1} = \hat{f}(\mathbf{X}_t, \Delta\mathbf{X}_t)

where ΔXt=XtXt1\Delta\mathbf{X}_t = \mathbf{X}_t - \mathbf{X}_{t-1} is the vector of period-over-period changes in credit attributes. The inclusion of ΔXt\Delta\mathbf{X}_t alongside Xt\mathbf{X}_t is the core representational contribution of BASIC: it provides gradient boosting models with both the current state and the direction of change of every credit dimension.

P2 — Bureau Sensitivity Recovery:

We estimate the marginal sensitivity of the predicted score to each input feature via structured perturbation:

δ^i(k)=f^(Xt+kei,ΔXt)f^(Xt,ΔXt)\hat{\delta}_i^{(k)} = \hat{f}(\mathbf{X}_t + k \cdot \mathbf{e}_i, \Delta\mathbf{X}_t) - \hat{f}(\mathbf{X}_t, \Delta\mathbf{X}_t)

where ei\mathbf{e}_i is the unit vector in the ii-th feature direction and kk takes a small set of calibrated values. The collection {δ^i(k)}\{\hat{\delta}_i^{(k)}\} constitutes an empirical sensitivity profile for the consumer — an approximation of how the bureau function responds to changes in each credit attribute, estimated without any access to bureau internals.

P3 — Action Impact Estimation:

For each action AAA \in \mathcal{A}, we define a differential feature vector ΔX(A)\Delta\mathbf{X}^{(A)} encoding the credit attribute changes that action AA would produce. The predicted score impact is:

ΔS^(AXt)=f^(Xt,ΔX(A))St\hat{\Delta S}(A \mid \mathbf{X}_t) = \hat{f}(\mathbf{X}_t, \Delta\mathbf{X}^{(A)}) - S_t

This treats action simulation as structured perturbation of the differential feature space. It is computable, personalised (because f^\hat{f} is nonlinear in Xt\mathbf{X}_t), and requires no causal identification assumptions beyond the approximate stationarity of the bureau scoring function across the observation window.

3.2 Distinction from Credit Risk Modelling

Credit risk modelling estimates P(defaultXt)P(\text{default} \mid \mathbf{X}_t). CSIP estimates E[St+1Xt,ΔXt]\mathbb{E}[S_{t+1} \mid \mathbf{X}_t, \Delta\mathbf{X}_t] and ΔS^(AXt)\hat{\Delta S}(A \mid \mathbf{X}_t) for each prospective action. The distinction matters practically: a model optimised for default prediction assigns high risk to thin-file borrowers regardless of their credit trajectory; CSIP is concerned with that trajectory explicitly.

The action simulation in P3 is best understood as predictive simulation rather than causal estimation in the potential outcomes sense. We predict what the model — which closely approximates the bureau scoring function — would output given the feature changes corresponding to an action. This is practically useful for guidance because the bureau's scoring function is itself deterministic conditional on observable credit attributes.

4. Data

4.1 Dataset Overview

BASIC is trained and evaluated on proprietary datasets obtained via user-authorised bureau pulls through the GoCredit AI platform operated by Tara Labs AI. The dataset comprises several hundred thousand individuals spanning millions of sample-months and over a million bureau profile snapshots. Experiments were conducted on both curated engineered feature sets and higher-dimensional raw-feature configurations.

All personally identifiable information (PII) is removed prior to modelling. Users are represented by anonymised identifiers throughout. Data collection is 100% user-consented via explicit opt-in under DPDP-compliant protocols.

4.2 Data Modality: Bureau Tradeline Data

BASIC operates exclusively on bureau tradeline data. No banking cashflow data, behavioural telemetry, or supplementary data modalities are used in the production system. This is both a practical and empirical choice: bureau data is available for any user with a bureau file, it is standardised across the Indian credit system, and — as the experimental results demonstrate — it is sufficient to achieve strong predictive performance. The system achieves R²=0.851 from bureau data alone; additional modalities are not required.

Data is obtained via user-authorised bureau pulls from Experian, Equifax, and CIBIL. Features include monthly credit score snapshots (StS_t), Days-Past-Due (DPD) flags at three severity thresholds (1–30, 31–60, 60+), credit utilisation ratios per tradeline and in aggregate, number and age of active and closed accounts, hard inquiry counts over 30/60/90-day rolling windows, secured vs. unsecured loan mix, EMI amounts and principal outstanding, sanction amounts, and written-off or settled account flags.

4.3 Feature Engineering

The production feature set combines absolute-level bureau features with a set of differential features derived from the raw bureau space. The differential representation is the primary engineering contribution: for each base feature XiX_i, we compute ΔXi=Xi(t)Xi(t1)\Delta X_i = X_i^{(t)} - X_i^{(t-1)}, the period-over-period change. This gives the model explicit access to the direction and magnitude of change in every credit dimension, capturing the dynamic aspect of bureau scoring that a static snapshot misses.

Broad feature categories include delinquency indicators (derived from payment-history parsing), account age and vintage aggregates, credit mix measures, payment history aggregates at varying recency and severity, separate secured and unsecured balance aggregates, and hard enquiry activity. Differential columns cover the primary loan-type taxonomy relevant to the Indian market, represented in both absolute and differential form so the model observes both portfolio composition and its evolution. The precise feature schema, aggregation windows, and transformation pipeline are proprietary and are not enumerated here.

4.4 Score-Bucketed Modelling

Bureau scoring dynamics differ substantially across the score distribution. The marginal impact of an additional delinquency event, a new enquiry, or a credit card closure is not constant across the 300–900 range. To address this, BASIC trains separate models across distinct segments of the score distribution. Score-bucketed models are used in the Bureau Sensitivity Module and the Boost Action Simulation Engine to produce sensitivity estimates and action impact predictions calibrated to the consumer's current score segment.

4.5 Handling Thin-File Users

Users with limited bureau history — fewer active tradelines, shorter account vintage, or fewer months of payment history — present a modelling challenge. For these users, features such as oldest active account opening date and payment history aggregates may be zero by construction rather than by genuine credit behaviour. The differential feature representation partially mitigates this: where there is no prior snapshot, differential features are zero, which is informationally correct rather than misleading.

Clustering approaches over the bureau feature space were explored to identify thin-file user segments and enable archetype-level treatment within score-bucketed models, allowing thin-file users to inherit sensitivity priors from population-level cluster representatives with similar credit profiles.

5. System Design

5.1 Score Prediction Module

The core of the BASIC system is a gradient-boosted tree regressor that maps a user's bureau feature vector to a predicted credit score. The primary model is a CatBoost Regressor optimised with RMSE loss and early stopping. CatBoost's native handling of categorical features via ordered target statistics removes the need for explicit encoding of categorical fields. For production inference, the trained model is exported to ONNX format and served via ONNX Runtime, enabling low-latency scoring independent of the training framework.

A secondary model — LightGBM — is trained with monotone constraints that encode domain knowledge: features where higher values are unambiguously beneficial or harmful are constrained to produce monotone response curves, preventing the model from learning spurious inversions in data-sparse regimes. Positive monotonicity is enforced on features for which more is unambiguously better (e.g., on-time payment counts, account diversity); negative monotonicity is enforced on features associated with credit stress (e.g., delinquency counts, recent enquiry activity). The exact constraint set is proprietary.

Monetary features are passed through a scaling transform before inference to reflect the right-skewed balance distributions across the user population.

Score-bucketed sub-models. Analysis of residual error distributions revealed that the score–feature relationship is non-stationary across the score range. Separate CatBoost models are trained across distinct segments of the score distribution. These are used in the feature sensitivity analysis pipeline (Section 5.3) to produce stratum-appropriate impact estimates rather than relying on a single global model whose inductive bias is dominated by the high-density mid-range.

5.2 Differential Feature Engineering

The central technical contribution of BASIC is a principled approach to counterfactual impact estimation that does not require a causal model of the bureau scoring function.

Setup. Let X\mathbf{X} be a user's current feature vector extracted from their bureau report, and let A\mathcal{A} be a finite set of actionable interventions.

Counterfactual feature vector. For each action aAa \in \mathcal{A}, a domain-specific generator computes the expected change in the feature vector, producing a differential ΔXa\Delta \mathbf{X}_a relative to the user's current state. The model consumes both the current state and the differential to produce a counterfactual score prediction, from which the estimated score impact δa\delta_a is derived as the difference between the counterfactual and the model's baseline prediction for the user.

Temporal drift correction. For actions evaluated over a horizon of several months, the differential is augmented with temporal components that reflect the expected evolution of vintage, the roll-off of aged delinquency observations, and the dilution of historical-payment aggregates as new on-time behaviour accumulates. This allows the system to predict the score impact of an action evaluated at a future point in time rather than treating the counterfactual as instantaneous. The exact transformations are proprietary.

Training the differential model. The model is trained to map current-state and differential features jointly to the month-ahead bureau score. The training set is augmented with perturbed records that expose the model to calibrated structured changes in individual features, enabling it to learn the local response surface of the bureau scoring function from data, without requiring access to the bureau's proprietary weighting formula.

5.3 Bureau Sensitivity Estimation

Method. For each user ii and feature jj, perturbed records are constructed by applying a small set of calibrated perturbations to feature jj while holding all other features fixed. The sensitivity of the predicted score to feature jj at a given perturbation level is the difference between the model's prediction on the perturbed input and the prediction on the original input. The pipeline aggregates these per-user sensitivities into mean and median impact estimates per (feature, perturbation) pair, producing a sensitivity table.

Stratum-specific sensitivity. The analysis is stratified by score bracket because sensitivity is non-uniform across the score distribution: a single additional delinquent account has a much larger impact at 650 than at 450 where multiple derogatory marks already exist. The pipeline partitions users into distinct score strata and runs the full perturbation sweep within each stratum.

Interpretation. This approach recovers bureau sensitivity as a numerical first-order approximation, not a formal causal effect. The sensitivity table provides an actionable rank-ordering of features by approximate impact within the regime the proxy model has learned. We do not claim this recovers the bureau's true weights; the system is explicitly framed as a learning-based approximation.

5.4 Boost Action Simulation Engine

The system defines a catalogue of credit actions available to Indian consumers, spanning:

  • Delinquency resolution: clearing write-offs, settling overdue accounts
  • New credit products: credit cards, secured cards, auto loans, house loans, gold loans, consumer finance loans, corporate credit, loan against property
  • Utilisation management: credit limit increases, utilisation ratio optimisation
  • Bureau hygiene: identifying inactive loans, zero-balance accounts, duplicate or unreported trade lines that may be suppressing scores

For each action type, a domain-specific perturbation function maps the action to a structured change in the feature vector. The model then predicts the score delta under this counterfactual feature vector. The mappings are informed by bureau reporting conventions and are calibrated against observed score movements in the historical data.

Post-action projection. After computing individual action impacts, the system accumulates a combined differential across all recommended actions and projects a future score assuming completion of all recommendations over user-specified time horizons.

6. Experiments

6.1 Experimental Setup

Datasets. Models were trained and evaluated on user-consented bureau data. The primary configuration uses curated engineered features across over a million bureau profiles. A secondary configuration uses a higher-dimensional raw-feature representation. Records with invalid or suppressed scores are excluded.

Train/validation/test split. A chronologically integrity-preserving split is used, with training, validation, and held-out test partitions. No post-test observations appear in training or validation sets.

Metrics. R², RMSE, MAE, MAPE.

6.2 Score Prediction Results

Curated feature configuration (CatBoost):

MetricValue
0.851
RMSE21.41
MAE15.67
MAPE2.18%

High-dimensional raw-feature configuration (CatBoost, GPU-trained):

MetricValue
0.815
RMSE26.51
MAE18.36
MAPE2.55%

The curated configuration achieves higher R² despite a much smaller feature set, demonstrating that semantically coherent features — closely aligned with how bureau scores are computed — outperform a high-dimensional representation containing many redundant or weakly predictive columns. This is consistent with the observation that gradient boosting benefits more from feature quality than feature quantity on structured financial data.

Both configurations are approximating the bureau's scoring function from observed (profile, score) pairs, not deriving it analytically. The residual variance (R² = 0.81–0.85) reflects information in the bureau's formula not present in the available feature set.

6.3 Score-Bucketed Models

Separate CatBoost models are trained across distinct segments of the bureau score distribution. The score–feature relationship has markedly different local derivatives by segment: delinquency signals dominate in the lower ranges, while utilisation and enquiry frequency drive variation in the upper ranges. Bracket-specific subsampling is applied during training to address the overrepresentation of mid-range scores in the raw data.

6.4 Feature Importance Analysis

A qualitative summary of feature importance is reported here; the specific ranked feature list is proprietary and omitted. Delinquency-related features dominate: aggregate past-due balance and the recency of the most recent delinquency event together account for a substantial share of model importance. Vintage features (card tenure, bureau history length) rank highly, consistent with the established role of credit history length in bureau scoring. Utilisation features — across secured and unsecured products — appear throughout the upper ranks. Demographic correlates such as age reflect a correlation with credit history length in the Indian market rather than an independent causal effect.

6.5 Clustering Analysis

Subspace clustering. Elastic-Net Subspace Clustering was explored on a representative sample. The resulting cluster distribution is skewed, with one dominant cluster reflecting the concentration of bureau profiles in a narrow mid-range score band.

Scalable k-means. A FAISS-based k-means implementation was used to cluster the full dataset beyond the memory constraints of standard libraries, enabling production-scale segmentation.

The intended use is targeted recommendation: routing users to per-cluster sensitivity tables rather than applying a single global table. This remains exploratory and is not yet integrated into the production pipeline. Specific cluster counts and hyperparameters are proprietary.

6.6 Baseline Comparisons

ModelRMSE
CatBoost (curated features)0.85121.41
CatBoost (high-dimensional raw features)0.81526.51
MLPRegressor (baseline)~0.03478.99

The MLP baseline is effectively non-predictive: R² = 0.034 means the network explains 3% of score variance, with a mean error of ~79 score points — nearly four times the CatBoost error. This result is consistent with the finding of Grinsztajn et al. (2022) that tree-based models systematically outperform neural networks on tabular data. Credit bureau data is a canonical example of the regime where trees excel: the target is a piecewise function of heterogeneous count, ratio, and date-elapsed features with complex threshold interactions and no spatial or sequential structure that neural architectures could exploit.

7. India-Specific Credit Market Analysis

7.1 The Scale of India's Credit Access Problem

India's credit landscape is at a genuine inflection point. An estimated 300 million individuals are credit-eligible by income but lack adequate bureau history — a structural gap that limits their access to formal lending regardless of their underlying repayment capacity (TransUnion CIBIL, 2023). Formal retail credit penetration stands at approximately 19% of GDP, compared to 50–70% in developed markets (RBI, 2024). The retail credit market itself is substantial and growing: outstanding retail credit reached approximately ₹22 trillion in FY2024 (RBI, 2024), and the expansion of digital lending has accelerated bureau coverage in urban and peri-urban populations.

Despite this growth, significant portions of the population — including gig workers, agricultural households, and informal-sector earners — either lack bureau records entirely or carry scores below the 650 threshold beneath which mainstream bank lending becomes difficult to access (Experian India, 2024).

7.2 India's Multi-Bureau Landscape

India currently operates four licensed credit information companies: TransUnion CIBIL, Experian India, Equifax India, and CRIF High Mark. The BASIC system processes data from CIBIL, Experian India, and Equifax India. Each bureau receives data from a partially overlapping but non-identical set of lenders, and each applies its own proprietary scoring methodology.

Lender reporting to bureaus is not uniform. Scheduled commercial banks report to multiple bureaus under RBI mandate, but Non-Banking Financial Companies (NBFCs) vary in their bureau reporting practices — some report to all four bureaus, others to only one or two, and the lag between credit events and bureau reflection differs across institutions. This creates a fragmented view of any individual's credit profile depending on which bureau is queried.

The BASIC system accounts for this by training separate sensitivity models where bureau identity is available in the data, and by designing features that are robust to partial bureau coverage.

7.3 India-Specific Credit Products and Feature Engineering

Several credit product categories that are structurally significant in India require explicit modelling treatment.

Gold loans. India's organised gold loan market exceeded ₹7 trillion in FY2024 (ICRA, 2024). Gold loans are short-tenure secured loans collateralised against physical gold, widely used by middle- and lower-income consumers — a product with no meaningful equivalent in most developed credit markets. The BASIC feature engineering pipeline explicitly categorises gold loans as a distinct secured loan type. The system's boost action set includes a dedicated GOLD_LOAN action type, reflecting empirically observed differences in how gold loan origination and repayment affect bureau score trajectories relative to other secured products.

Loan type taxonomy. The derived feature layer classifies credit accounts into a taxonomy calibrated to the Indian lending landscape: secured loans (auto, housing, gold, business/commercial, property), unsecured loans (personal loans, consumer durable/consumer finance loans), two-wheeler loans, education loans, and credit cards. Two-wheeler and consumer finance loans — categories representing a large share of first-time credit access for non-metro consumers — are treated as distinct product types because their bureau score implications differ from other unsecured products.

NBFC vs. bank reporting patterns. The system's feature set captures whether accounts are bank-originated or NBFC-originated, as this distinction affects both bureau reporting completeness and empirically observed score trajectories. Consumers whose credit portfolios are predominantly NBFC-reported exhibit systematic differences in bureau score behaviour relative to consumers with otherwise identical credit attributes but bank-dominated portfolios.

7.4 Geographic Variation: Tier 2 and Tier 3 Consumers

Users from non-metro cities exhibit credit profiles that differ systematically from metro users:

  • Higher relative reliance on NBFCs and regional lenders for credit access
  • Higher representation of two-wheeler loans and consumer durable loans as primary credit instruments
  • Higher average utilisation ratios on credit cards where cards are held
  • Greater proportion of thin-file users (fewer than 6 months of active credit history)

These differences have a direct implication for system design: boost actions and their estimated score impacts are not uniformly applicable across geographies and credit profiles. A utilisation-reduction action produces larger absolute score gains for consumers already at high utilisation baselines — a pattern more common in non-metro segments. The system's score-bucketed model architecture and differential feature approach are both partly motivated by the need to serve this heterogeneity accurately.

8. Current System and Research Roadmap

8.1 Deployed System Summary

The deployed BASIC system is a gradient boosting framework operating exclusively on bureau data. This section summarises the deployed capabilities, as distinct from the research directions described in Section 8.2.

Modelling framework. CatBoost (primary) and LightGBM (secondary), both trained with monotone constraints encoding domain knowledge about the expected direction of feature effects. The system predicts month-ahead credit scores and estimates the impact of specific credit actions on individual users' scores.

Differential features. The system operates on bureau-derived features that include differential features capturing period-over-period changes in credit attributes — the core representational contribution described in Section 5.2.

Action simulation. A catalogue of boost action types covers the major levers of credit score improvement: utilisation management, delinquency resolution, credit mix optimisation (including India-specific products such as gold loans and consumer finance loans), and bureau hygiene (identifying stale or erroneous trade lines).

Scale. The system is deployed in production, serving personalised credit improvement recommendations to consumers at scale.

8.2 Research Directions

The current system achieves strong results with gradient boosting on bureau data alone. The following directions represent opportunities to extend the framework. They are research directions, not deployed capabilities.

Causal inference for action impact estimation. The current feature perturbation approach estimates associative impacts. Methods such as Causal Forests (Wager & Athey, 2018) and Double Machine Learning (Chernozhukov et al., 2018) offer frameworks for estimating individual-level causal treatment effects under observational data. A formal A/B testing infrastructure for causal validation is a prerequisite for this work.

Temporal sequence modelling. Credit histories are sequential: the same attribute value carries different predictive content depending on its trajectory over the preceding months. LSTM and Transformer architectures trained on longitudinal bureau snapshots could capture longer-range temporal dependencies that the current differential feature approach approximates but does not fully model.

Multi-modal data integration. The current system uses bureau data only. Account Aggregator-enabled bank statement data (RBI, 2021) would add cashflow signals — income regularity, savings behaviour, liability payments not yet reflected in the bureau — that carry independent predictive content. Behavioural signals from product interactions represent a further potential data modality. Both raise consent and privacy considerations.

Federated learning for bureau collaboration. Training across multiple data custodians without requiring PII to leave individual institutions would improve sensitivity recovery for thin-file segments and minority bureau coverage.

Multi-step trajectory optimisation. The current recommendation engine generates the single highest-impact action at each decision point. Formulating the credit improvement problem as a constrained Markov Decision Process would enable multi-step trajectory planning — identifying sequences of actions that are jointly optimal over a consumer-specified time horizon.

Real-time score monitoring. Bureau data arrives as periodic pull-based snapshots. Continuous data streams from consented banking data connections would enable near-real-time monitoring of credit attribute changes, closing the feedback loop between consumer actions and model predictions.

9. Limitations

The following limitations are material to interpreting the system's outputs and the results reported in this paper.

Single data modality. The production system uses bureau data only. Bureau tradeline data captures what a consumer has done, not what they are doing: payment histories, account balances, and enquiry records reflect the past, with reporting lags that may be days to weeks. Banking cashflow data and other financial signals would add predictive content that bureau data alone cannot provide — particularly for the thin-file segment.

Bureau reporting lag. Credit events — a payment made, a new account opened, a loan settled — take 1 to 45 days to appear in bureau records depending on the lender's reporting cycle. This lag introduces noise in the observed relationship between credit actions and score changes. The model cannot distinguish between a consumer who has taken a recommended action and is awaiting bureau reflection and one who has not yet acted.

Bureau model opacity. Bureau scoring algorithms are proprietary and undisclosed. The BASIC system estimates bureau sensitivities empirically without access to bureau internals. Changes to bureau scoring models are not detectable in advance, introducing a period during which sensitivity estimates are stale. The magnitude of this estimation error during transition periods is difficult to quantify.

Feature perturbation is not causal inference. The system's action impact estimates are generated by perturbing feature values in a trained gradient boosting model and observing the change in predicted output. This produces personalised estimates that are useful in practice, but they are associative rather than causal. If users who take a particular credit action differ systematically from users who do not — in ways not fully captured in the feature set — the perturbation estimates will be biased. Users and practitioners should interpret boost action point estimates as decision-support heuristics rather than precise causal forecasts.

Cross-bureau generalisation. The system is trained primarily on CIBIL and Experian India data. Equifax India coverage is thinner, and CRIF High Mark is not processed in the current pipeline. Sensitivity estimates for Equifax-primary users carry wider confidence intervals.

Thin-file limitations. Users with fewer than 6 months of active credit history present a challenging modelling problem. Differential features are uninformative or absent for users without sufficient history. Prediction error for thin-file users is materially higher than for users with established credit histories.

Periodic retraining. The current models are retrained on a scheduled basis rather than updated continuously. Between retraining cycles, the models do not adapt to distributional shifts in the user population, changes in credit market conditions, or drift in bureau scoring behaviour.

10. Conclusion

Credit score improvement is a problem that affects hundreds of millions of people. The tools available to consumers for understanding and acting on their bureau scores have been, until recently, limited to generic educational content and broad heuristics. The ambition of this work is to replace heuristics with a quantified, personalised, and empirically grounded framework.

The BASIC system makes the following contributions:

1. Formal problem definition. We formalise the Credit Score Improvement Problem (CSIP) as distinct from default prediction — a framing that orients modelling choices toward actionable consumer guidance rather than lender risk management. To our knowledge, this is the first systematic treatment of credit score improvement as a standalone machine learning problem.

2. Differential feature engineering. We introduce a differential feature representation — capturing changes in credit attributes between periods rather than static snapshots — as the core modelling substrate for action-impact estimation.

3. Bureau sensitivity recovery via feature perturbation. Without access to proprietary bureau scoring logic, we develop an empirical approach to recovering bureau sensitivity estimates using gradient boosting models trained on observed input/output relationships.

4. Production deployment at scale. The system is deployed in production, serving personalised credit improvement recommendations to consumers at scale.

5. India-specific credit dynamics. We provide what is, to our knowledge, the first large-scale empirical characterisation of credit score improvement dynamics in the Indian market — including explicit modelling of gold loans and consumer finance products, NBFC vs. bank reporting differences, and geographic variation in credit profiles.

6. Behavioural Credit Analytics as a research discipline. We propose Behavioural Credit Analytics — the study of how financial behaviour drives credit score movement, and how that relationship can be quantified to benefit consumers — as a distinct research field.

The honest summary of current performance is: gradient boosting on bureau data alone achieves strong score prediction performance (R²=0.851, RMSE=21.41 on curated features; R²=0.815, RMSE=26.51 on high-dimensional configurations), and the differential feature approach produces useful personalised action impact estimates despite being associative rather than causal. The system works well enough to be deployed and useful; it falls short of what will eventually be possible with richer data modalities and more rigorous causal methodology.

The premise remains unchanged: millions of people want to improve their credit scores, and the system that judges them gives them almost nothing to work with. BASIC is an attempt to change that, grounded in what the data actually supports.

References

Athey, S., & Imbens, G. W. (2016). Recursive partitioning for heterogeneous causal effects. Proceedings of the National Academy of Sciences, 113(27), 7353–7360.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of KDD 2016, 785–794.

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., & Robins, J. (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal, 21(1), C1–C68.

Experian India. (2024). Credit Health Report: India 2024.

FICO. (2020). Understanding FICO Scores. Fair Isaac Corporation.

Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T., & Walther, A. (2022). Predictably unequal? The effects of machine learning on credit markets. Journal of Finance, 77(1), 5–47.

Grinsztajn, L., Oyallon, E., & Varoquaux, G. (2022). Why tree-based models still outperform deep learning on tabular data. Advances in Neural Information Processing Systems (NeurIPS) 2022.

Hand, D. J., & Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: A review. Journal of the Royal Statistical Society: Series A, 160(3), 523–541.

ICRA. (2024). Gold Loan Sector Update: India. ICRA Analytics.

Karimi, A.-H., Schölkopf, B., & Valera, I. (2020). A survey of algorithmic recourse: Definitions, formulations, solutions, and prospects. arXiv preprint arXiv:2010.04050.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems (NeurIPS) 2017.

Kvamme, H., Sellereite, N., Aas, K., & Sjursen, S. (2018). Predicting mortgage default using convolutional neural networks. Expert Systems with Applications, 102, 207–217.

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (NeurIPS) 2017.

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: Unbiased boosting with categorical features. Advances in Neural Information Processing Systems (NeurIPS) 2018.

RBI. (2021). Master Direction — Non-Banking Financial Company — Account Aggregator Directions, 2016 (Updated 2021). Reserve Bank of India.

RBI. (2024). Financial Stability Report, June 2024. Reserve Bank of India.

Thomas, L. C. (2000). A survey of credit and behavioural scoring. International Journal of Forecasting, 16(2), 149–172.

TransUnion CIBIL. (2023). Credit Market Indicator Report Q3 2023.

Ustun, B., Spangher, A., & Liu, Y. (2019). Actionable recourse in linear classification. Proceedings of the Conference on Fairness, Accountability, and Transparency (FAT)*, 10–19.

Wachter, S., Mittelstadt, B., & Russell, C. (2017). Counterfactual explanations without opening the black box. Harvard Journal of Law & Technology, 31(2), 841–887.

Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228–1242.

Correspondence: research@taralabs.ai Working Paper — 2025. Not peer reviewed. © 2025 Tara Labs AI. All rights reserved.

© 2026 Tara Labs AI. All rights reserved.