IFRS 9 PD Modeling Framework via Cohort Approach

This repository implements a cohort-based Probability of Default (PD) model aligned with IFRS 9 Expected Credit Loss (ECL) Requirements. The model estimates lifetime marginal PD(s) by tracking default behavior across origination cohorts and time since origination. It is designed for use in Stage 1, Stage 2, and Stage 3 (Impaired = PD 100%) calculations and supports transparent, auditable credit risk modeling suitable for regulatory and financial reporting.

Overview

This project implements a Cohort PD model designed to support IFRS 9 Expected Credit Loss (ECL) Calculation. The model estimates cumulative and marginal for the lifetime PD by tracking default behavior over time for exposures grouped by origination cohort. By observing how default rates evolve with time since origination (observed), the approach provides a transparent and interpretable framework that aligns well with IFRS 9 requirements for forward-looking credit risk estimation.

The implementation emphasizes:

Cohort-level transparency for auditability and model governance
Vectorized numerical computation for efficiency and scalability
Flexible aggregation across segments using exposure-based or observation-based weights

The resulting PD term structures can be directly used in Stage 1 and Stage 2 ECL Calculation. The project is intended to serve as a practical reference implementation for credit risk practitioners, model developers, and validators, rather than a black-box model. All calculations are made explicit, facilitating validation, backtesting, and model explainability.

Project Structure

pd_cohort_model/
├── models/          #Trainned model and parameters (pkl.)
│   ├── actual_cumulative_odr.pkl
│   ├── chain_cumulative_odr.pkl
│   ├── w_avg_chain_cumulative_odr.pkl
│   ├── w_avg_gamma_cumulative_odr.pkl
│   ├── w_avg_gamma_parameters.pkl
│   ├── unbias_cumulative_odr.pkl
│   ├── fwl_model.pkl
│   └── pit_cumulative_lifetime_pd.pkl  
├── notebooks/
│   ├── 01_data_preparation.ipynb
│   ├── 02_chaid_segmentation.ipynb
│   ├── 03_base_cohort.ipynb
│   ├── 04_fwl_model.ipynb
│   └── 05_lifetime_calibration.ipynb
├── src/
│   ├── data_prep.py
│   ├── segment_support.py
│   ├── base_builder.py
│   ├── regression_model.py
│   ├── curve_calibration.py
│   ├── stats_testing.py
│   └── plot_function.py
├── data/          
│   ├── processed/
|   |   ├── train_data.parquet          #Not tracked by git
|   |   ├── cohort_count.parquet
|   |   ├── unbias_odr.parquet
|   |   ├── monthly_odr.parquet
|   |   └── mev_transformed.parquet
│   └── raw/
|   |   ├── usedcar_transaction_score.parquet          #Not tracked by git
|   └── └── mev_data.csv
├── requirements.txt
└── README.md

Project Details

0. Model Segmentation

Note: The model segementation is only illustrative proposed. The full annlysis will be performed in another repository.

MOB	DPD	B-Score	Segment
<=6	0		segment_0
<=6	1-30		segment_1
<=6	31-60		segment_2
<=6	60-90		segment_3
>6		B1	segment_4
>6		B2	segment_5
>6		B3	segment_6
>6		B4	segment_7
>6		B5	segment_8
>6		B6	segment_9
>6		B7	segment_10
>6		B8	segment_11

1. Unbias Model

1.1 Cohort Building

Cohort Based Default Measurement: Exposures are segmented into homogeneous cohorts defined at observation point (e.g., vintage, product type, risk band). For each cohort, default events are tracked over time since observation point to construct marginal and cumulative default "triangles" forming the empirical basis for lifetime PD estimation under an IFRS 9 consistent default definition.

1.2 Chain-Ladder

Chain‑Ladder Pattern Projection: Observed cohort default triangles are now run-off triangles. Given this fact that the more recent period, the less lifetime windows for tracking default. To fill the triangles, using chain‑ladder (development factor) methodology to extrapolate incomplete default histories. This produces projected ultimate cumulative lifetime PD for each cohort, enabling consistent estimation even for recently originated (immature) portfolios.

1.3 Gamma Fitting

Note: The Gamma distribution can be replaced by other statistic disctributions such as Weibull distribution. Or even mathematical formula e.g., Nelson Siegel but it needs to transform into correct basis. In this repository, the Gamma distribution is leveraged.

Parametric Model via Gamma Distribution: A Gamma distribution is fitted to the projected (extended) cumulative PD term structure for each cohort to remove sampling noise, enforce monotonicity, and obtain a smooth, stable PD curve. In the step, the segments from cohort built might be groupped as a pool level in case those segments are unable to build a stable curve by its own.

[Gamma distribution parameters]
Pool 0: Segment - ('segment_0',)
    Alpha: 1.7143     Beta: 1.0866     Constant: 0.0902
Pool 1: Segment - ('segment_1', 'segment_2', 'segment_3')
    Alpha: 0.8335     Beta: 1.2917     Constant: 0.3896
Pool 2: Segment - ('segment_4',)
    Alpha: 2.9373     Beta: 0.7116     Constant: 0.0129
Pool 3: Segment - ('segment_5',)
    Alpha: 3.1115     Beta: 0.6368     Constant: 0.0120
Pool 4: Segment - ('segment_6',)
    Alpha: 3.2286     Beta: 0.5826     Constant: 0.0220
Pool 5: Segment - ('segment_7', 'segment_8')
    Alpha: 2.4024     Beta: 0.6291     Constant: 0.0717
Pool 6: Segment - ('segment_9',)
    Alpha: 1.2083     Beta: 0.9439     Constant: 0.2240
Pool 7: Segment - ('segment_10', 'segment_11')
    Alpha: 0.6687     Beta: 1.1028     Constant: 0.4238

The Kolmogorov–Smirnov (K‑S) test is used to assess how well a dataset fits a specified theoretical distribution. During model development, two separate K‑S tests were performed to evaluate the fit of the Gamma function to the lifetime PD experience using PD Pool.

[KS Test]
Pool 0: Segment - ('segment_0',)
n: 7 KS-Stat: 0.1429 D-Critical: 0.483 Result: Pass
Pool 1: Segment - ('segment_1', 'segment_2', 'segment_3')
n: 6 KS-Stat: 0.1667 D-Critical: 0.519 Result: Pass
Pool 2: Segment - ('segment_4',)
n: 6 KS-Stat: 0.1667 D-Critical: 0.519 Result: Pass
Pool 3: Segment - ('segment_5',)
n: 6 KS-Stat: 0.1667 D-Critical: 0.519 Result: Pass
Pool 4: Segment - ('segment_6',)
n: 6 KS-Stat: 0.1667 D-Critical: 0.519 Result: Pass
Pool 5: Segment - ('segment_7', 'segment_8')
n: 6 KS-Stat: 0.1667 D-Critical: 0.519 Result: Pass
Pool 6: Segment - ('segment_9',)
n: 6 KS-Stat: 0.1667 D-Critical: 0.519 Result: Pass
Pool 7: Segment - ('segment_10', 'segment_11')
n: 6 KS-Stat: 0.1667 D-Critical: 0.519 Result: Pass

1.4 Unbias Calibration

Unbias Calibration: The smoothed (Gamma) PD(s) are calibrated to align with long-run (TTC) Observed Default Rate (ODR). This is to ensure the key Unbias concept of IFRS 9 that no structural optimism or conservatism in the PD Estimated. The calibration is based on the concept that ratio of odds ratio for month m or year y and 12 months or 1-year will remain the same shape for segmentation level and the lifetime pool level. The equation below is for unbias calibration of odds function:

$$ Unbias\ lifetime\ ODR = \frac{ \text{ODR}_{\text{Unbias}} \cdot \frac{\text{ODR}_{TTC}}{\text{ODR}_{Target}} }{ \text{ODR}_{\text{Unbias}} \cdot \frac{\text{ODR}_{TTC}}{\text{ODR}_{Target}} + \left(1-\text{ODR}_{\text{Unbias}}\right) \cdot \frac{1-\text{ODR}_{TTC}}{1-\text{ODR}_{Target}} } $$

where;

${\text{ODR}}_{\text{Unbias}}$ is a 12-months ODR from each segmentation;
${\text{ODR}}_{\text{Target}}$ is a ODR at month of 12 when monthly level or at year of 1 when yearly level for a corresponding to the lifetime pool;
${\text{ODR}}_{\text{TTC}}$ are TTC PD(s) in month m or year y for a corresponding to the lifetime pool

2. Forward-looking Model

2.1 Observed Default Rates (ODR)

12-months Observed Default Rates (ODR) are employed in a linear regression framework to quantify and analyze the relationship between ODR and the macroeconomic variables. The use of a 12‑month observation window helps smooth short‑term volatility and captures underlying credit risk dynamics, thereby providing a more stable and representative measure of default behavior for assessing macroeconomic sensitivity.

The historical ODR(s) are transformed using a logit function. The logit function converts continuous variables bounded between 0 and 1 into an unbounded (infinite) scale. This transformation is commonly applied to default rates to expand the range of the dependent variable, thereby enhancing its sensitivity and responsiveness in linear regression modeling. As a result, linear regression is preferred over logistic regression, as it allows for a broader set of established statistical tests to assess the model’s technical robustness and overall goodness of fit.

2.2 Macroeconomics Variables Transformation

A set of 24 macroeconomic variables (MEV) is used in the forward-looking model.The expected intuitive direction of their correlation with default rates for the portfolio. The intuitive direction reflects the anticipated relationship between changes in macroeconomic conditions and changes in default rates. For instance, an increase in the unemployment rate is expected to lead to higher default rates, implying a positive correlation.

No.	Macroeconomics variables	MEV	Sign with default	Reasons	Data type
1	Gross Domestic Product	GDP	Negative	The gross domestic product (GDP) is one of the primary indicators used to gauge the health of a country's economy. Therefore, an increase in GDP is expected to decrease in default rate.	Flow
2	Foreign Direct Investment	FDI	Negative	A high Net FDI indicates good economic condition in Thailand as open economies with good growth prospects attract large amounts of FDI. Hence, net FDI is expected to be negatively related to default rate.	Flow
3	Household Debt	HHD	Positive	Debt affects borrower ability to repay loans. Higher household debt indicates that borrows are less likely to be able to make a repayment.	Flow
4	Corporate Debt	COPD	Negative	High corporate debt indicate good bond market which also reflect good economic. Hence, the relationship with default rate is negative.	Flow
5	Government Debt	GOVD	Positive	Government debt causes inflation increases and also default rate increases from inflation effects.	Flow
6	Government Expenditure	GOVE	Negative	Government expenses increases, which indicates that the economy's growth increases. This will result in a negative projection between government expenses and default rate.	Flow
7	Imports	IMP	Negative	Rising level of imports indicates robust domestic demand and a growing economy. The strengthening of economic activity is negatively related to the default rate.	Flow
8	Exports	EXP	Negative	Higher exports stimulate economic growth by increasing the aggregate demand of the economy. Hence, it will decrease default rate as the economy is in a good condition.	Flow
9	Policy Interest Rate	PIR	Positive	The interest rate at which a depository institution lends funds to another depository institution (short-term) or the interest rate the central bank charges a financial institution to borrow money overnight. Overnight policy rates increases, lending cost increases, default rate increases.	Rate
10	Minimum Loan Rate	MLR	Positive	As the lending rate increases, the total cost of overall lending increases, leading to a increasing default rate.	Rate
11	Nominal Effective Exchange Rate Index	NEER	Positive	High NEER indicates a stronger currency which hurts exports and increases default rate.	Index
12	Real Effective Exchange Rate Index	REER	Positive	High REER indicates a stronger currency which hurts exports and increases default rate.	Index
13	Wage	WAGE	Negative	A higher wage indicates that the borrower will be more likely to pay the loan. Due to increased wealth which will then decrease default rate.	Price
14	Unemployment Rate	UNEM	Positive	Unemployment leads to forgone investment in economic growth as it indicates the cost of society for not fully running production. The relationship between business activities and unemployment rate is negative, hence the unemployment rate is assumed to be positively related to default rate.	Rate
15	Consumer Confidence Index	CCI	Negative	The increase in Consumer Confidence Index indicates an increase in degree of optimism on the state of the economy that consumers are expressing through their activities of spending and saving. In a better economy, we expect a decreasing default rate.	Index
16	Private Investment Index	PII	Negative	Higher investment indicates good economic performance, default rate should be decreased.	Index
17	Business Sentiment Index	BSI	Negative	A higher Leading Indicator indicates the economics growth and borrower will be more likely to pay off the debt due to this increased wealth which will then decrease default rate.	Index
18	Number of foreign tourists visiting Thailand	TOUR	Negative	A high number of tourists visiting Thailand indicates the economics growth and business will be more likely to gain income, which will then decrease default rate.	Flow
19	Oil Price	BROLP	Positive	Oil prices affect the prices of many consumer goods. A rising oil price increases the living cost of the borrowers. Therefore, due to reduced wealth, default is positively related to oil price as business cost get more expensive with the increase in oil price.	Price
20	Industrial Production Index	INDPRO	Negative	High Industrial Production Index indicates strong economic performance, default rate should be negatively decreased.	Index
21	Capacity Utilization Rate	CAPU	Negative	High Capacity Utilization Rate indicates strong economic performance, default rate should be negatively decreased.	Index
22	Broad Money	BROMO	Negative	High Broad money indicates strong economic performance, default rate should be negatively decreased.	Flow
23	Foreign Reserve	FRES	Negative	A large foreign reserve indicates strong economic performance, default rate should be negatively decreased.	Flow
24	Labour Index	LAB	Negative	High labour index indicates the stregthening labour market, with increased employment or wages. As a result, the default rate is expected to decreased.	Index

The MEV Time series may not have a direct relationship with the dependent variable. Therefore, several transformations or alternative specifications may need to be considered to identify a meaningful relationship.

Transformation	Formula
Year-on-Year Changed (Rate)	$MEV_{t} - MEV_{t-12}$
Year-on-Year Changed (Non-Rate)	$(MEV_{t} - MEV_{t-12}) / MEV_{t-12}$
Natural log transformation	$LN(MEV_{t})$
Moving average	$(MEV_{t} + MEV_{t-1} + MEV_{t-n} + ...) / n$
Leading indicator	$Lag_{t}(MEV)$

2.3 Univariate Analysis

After completing the transformation of the MEV(s), preliminary assessments are conducted to further narrow down the candidate variables prior to multivariate analysis. An MEV is performed the single linear regression and retained if the variable meets the following criteria:

p-value significant of 5% and;
R-Square is higher than 50% and;
It demonstrates an intuitive relationship with the dependent variable, with the expected direction of the relationship predefined.

Note: MEV(s) that exhibiting either increasing or decreasing trends are permitted to proceed to subsequent analysis steps.

[Univariate analysis]
=== Result ===
Number of passed variables: 214

2.4 Multivariate Analysis

One of the commonly used industry methods to assess multicollinearity is variable clustering, which partitions a set of variables into non-overlapping clusters. This technique is implemented using the varclushi-opt library in Python. The objective is to form clusters in which variables are highly correlated with one another while exhibiting low correlation with variables in other clusters.

The varclushi-opt can see further details on varclushi_opt.

An MEV is performed the cluster analysis and retained if the variable meets the following criteria:

Top 2 lowest R-Square ratio per cluster
Top 2 highest R-Square per cluster

For model development purposes, all possible combinations of factors across clusters will be evaluated. This exhaustive assessment ensures that the full space of candidate models is explored, thereby increasing the likelihood of identifying the optimal model specification. When generating factor combinations, the following constraints are imposed to control multicollinearity and preserve interpretability:

A single combination must not include more than one variable from the same cluster.
A single combination must not include more than one variable originating from the same pre-transformation group.

The number of variables included in each model may be adjusted based on empirical results or business considerations. However, the number of factors is capped at three, as including more than three variables typically increases the risk of multicollinearity and can adversely impact model stability and performance.

=== Result ===
Number of passed variables: 33

[Possible combinations of 1 variable(s)]
    Number of combinations: 33
[Possible combinations of 2 variable(s)]
    Number of combinations: 468
[Possible combinations of 3 variable(s)]
    Number of combinations: 3731

Totol combination for regression model: 4232

2.5 Multiplie Linear Regression

Multiple linear regression is widely used in the industry for predictive modeling. This regression approach aims to estimate the relationship between a dependent variable and one or more independent variables, which in this context are macroeconomic variables. When a model includes two or more independent variables, it is referred to as a multiple linear regression model. The table below is summary of model diagnostic assumptions test for conclusion of BLUE (Best Linear Unbias Estimator):

Category	Description	Common tests	Risk (if not satisfied)	Passed criteria
p-value significant	The p-value of coefficients are less than or equal to 10%	OLS Regression	p-values is not statistically significant	< 0.05
Multicollinearity	Independent variables are not strongly linearly related	Variance inflation factor (VIF)	Imprecise and/or ill-defined coefficient estimates	< 10
Residual normality	Residuals follow a normal distribution	Anderson-Darling test	Inaccurate p-values	> 0.05
Residual homoscedasticity	Variance of residuals is independent of the fitted value	White test	Inaccurate p-values	> 0.05
Residual autocorrelation	Residuals are not autocorrelated	Durbin Watson	Inaccurate p-values; in particular, positive autocorrelation overstates significance of variables	(1 - 3)
Residual stationary	(Co-integration) implying residuals are stationary, i.e. display constant mean and variance over time	Augmented Dickey-Fuller test	Inaccurate p-values and misleading R-Square	< 0.1

2.6 Model Back-testing

3. Lifetime Model

3.1 Cohort Curve Construction

Unbias Cohort Curves: The cumulative PD curves derived from the unbias model is preparing for calibration. These must be transformed into weighted average conditional PD ready for lifetime calibration.

3.2 Forward-looking Information

PD Prediction: The forward‑looking information is incorporated through outputs from the forward‑looking PD model. The macroeconomic variables is used as input for forecasting time‑varying PD, allowing historical cohort PDs to be adjusted to reflect expected future economic conditions across the projection horizon.

Final MEV(s): ['BROLP_MA9M_LAG9M', 'UNEM_MA9M_LAG3M', 'PIR_MA6M']
Forecasting PD
Year 1: 2.92%
Year 2: 1.45%
Year 3: 1.83%

3.3 Calibration by Logit Approach

Logit Approach: The calibration using a logit approach to ensure numerical stability and proportional adjustment across time and segmentation. Cohort curves and the predicted PD(s) are transformed into logit space, where calibration factors are estimated to align adjusted PD curves with observed default outcomes while preserving smoothness, monotonicity, and lifetime plausibility.

The calibration process is done on logit space of conditional PD while the base cohort curves are computed by cumulative basis. The transformation is done by following formula to perform the calibration and convert back to final result.

$$ Marginal\ PD = Cumulative\ PD_{t + 1} - Cumulative\ PD_{t} $$

$$ Conditional\ PD = \frac{{Marginal\ PD}_{t}}{1 - {Cumulative\ PD}_{t - 1}} $$

$$ Calibration\ PD_{Portfolio} = Logit(Conditional\ PD_{t}) + Logit(FWL\ PD) - Logit(TTC\ PD) $$

$$ Calibration\ PD_{Segment} = Logit(Conditional\ PD_{t}) + Logit(FWL\ PD) - Logit(TTC\ PD) + Delta $$

where;

$t$ is a time in lifetime period

3.4 Optimization PiT PD

Delta for PiT PD: The optimization process is to minimize overall deviations between the adjusted average PD Curve of portfolio level and segment level PD Curves. In the other words, the forecasting PD from the forward-looking model should represent overall risk of the portfolio since the model has been done based on portfolio level. By seperating calibration by segment, it could cause the deviations from portfolio risk. This optimization process of delta is to ensures consistency across segments, avoids distortion of portfolio level risk, and results in optimized lifetime PD term structures that are forward‑looking, unbias, and suitable for IFRS 9 Reporting and risk management use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IFRS 9 PD Modeling Framework via Cohort Approach

Overview

Project Structure

Project Details

0. Model Segmentation

1. Unbias Model

1.1 Cohort Building

1.2 Chain-Ladder

1.3 Gamma Fitting

1.4 Unbias Calibration

2. Forward-looking Model

2.1 Observed Default Rates (ODR)

2.2 Macroeconomics Variables Transformation

2.3 Univariate Analysis

2.4 Multivariate Analysis

2.5 Multiplie Linear Regression

2.6 Model Back-testing

3. Lifetime Model

3.1 Cohort Curve Construction

3.2 Forward-looking Information

3.3 Calibration by Logit Approach

3.4 Optimization PiT PD

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 176 Commits
data		data
model		model
notebooks		notebooks
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

IFRS 9 PD Modeling Framework via Cohort Approach

Overview

Project Structure

Project Details

0. Model Segmentation

1. Unbias Model

1.1 Cohort Building

1.2 Chain-Ladder

1.3 Gamma Fitting

1.4 Unbias Calibration

2. Forward-looking Model

2.1 Observed Default Rates (ODR)

2.2 Macroeconomics Variables Transformation

2.3 Univariate Analysis

2.4 Multivariate Analysis

2.5 Multiplie Linear Regression

2.6 Model Back-testing

3. Lifetime Model

3.1 Cohort Curve Construction

3.2 Forward-looking Information

3.3 Calibration by Logit Approach

3.4 Optimization PiT PD

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages