Score cut-offs can blow up in your face

Posted to Linkedin at https://www.linkedin.com/today/post/article/20140609165942-5425117-score-cut-offs-can-blow-up-in-your-face
Risk scores are extremely powerful tools in determining the final disposition of credit applications. Typically scores are used in a consumer lending scenario – but can be in a commercial environment as well (SME segment).
Most scores would include variables encompassing application details, bureau variables (including a generic bureau derived score – for e.g. a FICO Score, or Vantage Score – or in India the CIBIL TransUnion Score) and internal bank variables if the customer already has a relationship with the bank. In absence of a specialized application score – the generic bureau score can also be used to grade the applications.
Operationally, the scores can be used to give Yes/No decisions to customer applications – though in some scenarios; scores on the margins can be referred or decisions on partial exposures taken.
Most Banks/Financial Institutions will calibrate the scores using extensive analysis to identify the Odds or Bad rate at score bands. A business specific bad rate definition can be used here – for e.g. 2, or 3 missed payments in next 12 months (i.e. the loan going bad in a fixed time period post loan sanction). This calibration can be done by retrospective analysis of applications in the past – and their performance post sanction. (Assumption being that the patterns of past will propagate into the future without too much variance – macro or otherwise). Basis the retro analysis – a score cut-off is identified which allows the bank to target a specific bad rate. The score cut off also forces a rejection rate on incoming applications.
In order to illustrate impact of score cut-offs on bad rates – I am going to assume the score has been calibrated to grade incoming applications on a normal distribution with a mean score of 600 and a standard deviation of 50 points. Additionally the score has been equalized at an anchor of 600 with a PDO (points to double the odds) at 25 points. (There is no fixed rule that a scorecard needs to be centred at the mean/median point – done for illustration purpose only). Odds at the centre are calibrated at 1/70 or roughly 1.42% of customers with score of 600 will go delinquent on their loan.
Below graphic and table gives the score distribution of a 10,000 applicants and the bad rate by score bands.

PIC1

The below table gives the bad rate by score cut-offs for the same population –>

Score Band % Bad Rate
No cutoff+ 2.6%
450+ 2.5%
475+ 2.3%
500+ 1.9%
525+ 1.5%
550+ 1.1%
575+ 0.7%
600+ 0.4%
625+ 0.2%
650+ 0.1%
675+ 0.1%
700+ 0.0%
725+ 0.0%
750+ 0.0%

The table essentially tells as that out of the 10000 odd customers – the expected bad rate if the bank approves everybody is 2.6%. i.e. no cutoff – we get an approval rate of 100% and a bad rate of 2.6%.

As evident- there is a trade off between approval rate and the expected bad rate – in order to reach our target bad rate of 1.5 % – the  table can be referred to identify 525 as a potential score cut-off.

That is banks can continue to approve applications to score bands which in isolation may be considered high risk, but pooled with a larger number of customers in higher bands, still maintain the overall portfolio bad rate. And why would a bank lend to customers in say the 550-599 band when it clearly has an elevated bad rate – there can be a multitude of reasons – capturing market share, approval rate pressures and sales targets – you name it. After all, sub- prime customers are the most profitable, as long as we can predict the bad rates; and have a pool of good customers to balance them out. Sub-prime customers are theoretically charged a higher interest rate which is supposed to take care of the extra risk the bank is taking.

So now by enforcing a cut-off of 525 on incoming applications (instead of 575) – we get an approval rate of approximately 93%. (Calculating area under curve of a normal distribution with a known mean and std deviation). i.e. approximately 7 % of incoming applications will be deemed as high risk and rejected – and approved population will have a target bad rate of 1.5%. Now with a 93% approval rate, both the risk and sales teams are happy! Or are they?

Let the Bad times roll

 

One major weakness of using score cut-offs is the long list of assumptions inherent in score building and deployment process. Even slight deviations from these assumptions can have a disproportionate impact on the risk exposure of the bank.

One of the most critical assumptions is around the probability distribution of the applications. Score cut offs  are calculated based on studying the past distributions (they need not be  normal), as in the case of the example being discussed – based on the chart above – a cutoff of 525 gives an approval rate of 93% and a bad rate of 1.5%.  If the distribution remains stable – the cutoff can give a predictable bad rate and can be controlled and the bank can confidently lend to subprime customers as well, thus cornering market share as well as a much healthier interest spread while relying confidently relying on their ‘Million Dollar Statistical Model’.

However, take the scenario of worsening macro-economic situations (not unlike witnessed in 2008), or a new sourcing channel opening up. A distribution shift can happen for any number of reasons – and even slight deviations can have a large impact.

For e.g. let’s assume the distribution of incoming applications left shifts to a mean of 580 (from 600 previously). The std  deviation and PDO remaining constant – the table below gives the impact on the bad rates based on different cut offs now –>

PIC2

The above figure shows the new application distribution as compared to the original.

Assuming the score anchor and PDO remains unchanged, based on the new incoming application distribution – we see the shift in the score based cutoffs. Previously the score cutoff at 525 gave a bad rate of 1.5%. However when the applicants mean shifts to 580 from 600 originally, the same score cutoff of 525 now gives a bad rate of 2.1%. (an increase of more than 30% in the bad rate!), and that’s not all – the approval rate has now fallen to 86% – a rejection rate of 14%.

 

Score Band % Bad Rate (New Dist.)
No Cutoff 4.7%
450+ 4.2%
475+ 3.6%
500+ 2.9%
525+ 2.1%
550+ 1.3%
575+ 0.8%
600+ 0.5%
625+ 0.3%
650+ 0.1%
675+ 0.1%
700+ 0.0%
725+ 0.0%
750+ 0.0%

The reason for this is due to the distribution shifting slightly to the left, % of applicants in higher score bands go down, these customers were supposed to drive the portfolio bad rates down – but now, the % of customers sourced in the not so good score bands shoot up (but hey, we didn’t compromise on the score cutoffs did we?).

The sales team is now hopping mad with rejection rates having more than doubled from before; and risk team is under pressure – even after rejecting so many applications – the bad rates are shooting up!

A cursory look at the re-calculated bad rates on the updated distribution shows that the score cutoff needs to be revised to 550 from 525 to maintain the same bad rate as before. The actual approval rate needs to be 72%!

This illustrates how a small shift in the incoming population would need the risk team to quickly revise the score cutoff to bring the approval rate down from 93% earlier to 72% now just to maintain the target bad rate.

What this essentially means is that the risk exposure of the bank has suddenly shot up, the subprime customers have actually not been priced correctly on this model now, the interest rate calculation did not take into consideration this particular scenario. The bank continues to source on the new distribution  – confident that the score will continue to perform (which it is – just not as assumed).

It may not end here, when macro-economic parameters deteriorate, worsening credit quality of incoming customers as discussed above is one symptom, the other impact happens on the credit scores itself. Scores built or calibrated on ‘good times’ will almost certainly begin to wander when the ‘bad times’ come in. The score odds are not set in stone and do change based on how the industry is performing.

In ‘bad times’ deterioration of the odds ratio itself at score intervals can be expected as many banks found out in 2008. (FICO faced some heat for this), however the basic purpose of the score still holds irrespective – which is to rank order customers from highest risk to lowest risk. In a case where macroeconomic parameters impact individual behaviour – any score would need to be recalibrated to capture new behaviour. The basic presumption of past behaviour propagating into future is invalidated here as behaviour is now changing rapidly.

For our example where the score was anchored at 600 with an odds of 70 to 1 and PDO of 25, lets assume a deterioration of odds to 60 to 1 with the PDO unchanged. The new interval bad rate table is as below (capped at 99% for lowest interval) –>

 

Score Band % New Bad Rate Original Bad Rate
<450 99.0% 91.4%
450-474 53.3% 45.7%
475-499 26.7% 22.9%
500-524 13.3% 11.4%
525-549 6.7% 5.7%
550-574 3.3% 2.9%
575-599 1.7% 1.4%
600-624 0.8% 0.7%
625-649 0.4% 0.4%
650-674 0.2% 0.2%
675-699 0.1% 0.1%
700-724 0.1% 0.0%
725-749 0.0% 0.0%
750+ 0.0% 0.0%

The difference may not look very high, but let’s explore what happens when we combine this new data with our update probability distribution for the cutoff bad rates.

Score Band % Bad Rate (New Dist.) % Bad Rate (Old Dist.)
No cutoff 5.4% 2.6%
450+ 4.9% 2.5%
475+ 4.2% 2.3%
500+ 3.3% 1.9%
525+ 2.4% 1.5%
550+ 1.6% 1.1%
575+ 1.0% 0.7%
600+ 0.5% 0.4%
625+ 0.3% 0.2%
650+ 0.2% 0.1%
675+ 0.1% 0.1%
700+ 0.0% 0.0%
725+ 0.0% 0.0%
750+ 0.0% 0.0%

Based on previous cutoff of 525 – post a odds and a population shift, the actual new bad rate faced by the bank is 2.4% instead of the expected 1.5%, i.e. the bad rate has suddenly spiked up by 60%.

To compensate for this, the score cut off actually needs to be revised significantly north of 550, with an approval rate of even lesser than the 72% when the odds had not shifted.

Both factors; a population shift alongwith odds change can deliver a double whammy to the risk team of any bank. There are practical problems a risk team will face in convincing the sales head that the approval rate needs to be cut down to less than 70% from 94% earlier because of the small matter of score mean shifting by 20 points (on a score scale which ranges from 400 to 800) and a odds shift to 60 to 1 from 70 to 1.

 

While the scores continue to do their job of ranking customers, reliance on pure cutoffs by banks can be suicidal and invalidate the scorecard needlessly. Like any other tool a scorecard is also only as good as the risk manager behind it. If a risk manager does not have the authority or freedom as a case in point in this example to cut approval rates down to 70% from 94%, cutoffs will simply not work. In fact quite the opposite, enforcing a score cutoff can be spectacularly counterproductive.

While the illustration discussed above is fairly simplistic and with assumptions which are unlikely to present themselves so neatly in the real world, the scenario discussed has unfortunately replicated itself in many banks and lenders throughout the world.

Raghuram Rajan (ex IMF chief economist and current RBI governor) talks about a conference he attended in his book ‘Fault Lines’ where he is addressing a group of risk managers (a while before 2008 happened) about tail risk and its possible impact; the talk is not well received by the audience and then someone pulls him aside and tells him that the risk managers who could understand and push what he was saying inside their bank’s had long since been fired for being Cassandra’s. The whole concept of tail risk is that while probability of the event happening is low, but when it does happen – they wipe out all profits accumulated over the so called good times. The concept that pricing the risk to lenders exposing themselves to subprime can be modelled out is inherently faulty; and while using scores – you may be able to generate handsome profits over years and years – tail risk is actually much higher than what our models estimate.

Data Analysis ToolPak – Karl Pearson Correlation Matrix

In this video segment, I talk about enabling the Data Analysis ToolPak – Addin in excel. This is a powerful and rarely explored feature in MS Excel which can do a lot of stuff. In this series of video demonstration, I will be exploring these features.

The first of which is creating a Correlation matrix in Excel. The file used can be downloaded  here –> car_sales.

 

SAS Programming – Day 10

Arrays in SAS

  • SAS arrays are another way to temporarily group and refer to SAS variables. A SAS array provides a different name to reference a group of variables
  • Array statement begins with keyword ARRAY followed by array name and N – number of elements within array
  • _temporay_  option is used to create a temporary array

Base SAS Programming – Day 9

http://youtu.be/3MuoAMW9tt4

Looping in SAS

  1. Functions in SAS:  Continued
    • Text function:
      •  Compress – Returns a character string with specified characters removed from the original string
      • Index – Returns the position of the specific character in a string
    • Use of upcase, lowcase and propcase functions in string comparison
    • Math / Stat functions: Like Int, Round, Sum, Mean etc
    • Difference between Mean value (or any aggregate function) of Proc Means / Summary and Mean function
  2. Loops in SAS : Do Loops
    • Loops are used to iterate through every observation for specified number of times to obtain a desired result
    • Types of Loops:
      • Do Loop
      • Do While
      • Do Until
    • Default increments by 1
    • Can use BY to increment by any value other than 1

Base SAS Programming – Day 8

Functions in SAS (Continued)

  1. Text Functions
    • Catx – to concatenate characters / strings with any delimiter. Cat is also a function used to concatenate characters / strings
    • Trim – to remove trailing blanks in a string
    • Tranwrd – Replaces all occurrences of a substring in a character string
    • Translate – Replaces specific characters in a character expression.
    • Do check out Compress and other Text functions like Upcase, Lowcase and Propcase
  2. Date and Time Functions
    • Day – Returns the Day from a SAS date value
    • Month – Returns the Month from a SAS date value
    • Year – Returns the year from a SAS date value
    • Week – Returns the week number from a SAS date value (Try weekday function)
    • Mdy – Concatenates Month, Day and year into a date value
    • Today – Returns current system date
    • Datdif – Difference between 2 dates in days
    • Yeardif – Difference between 2 dates in years
    • INTCK – Returns the number of interval boundaries of a given kind that lie between 2 dates.
    • INTNX – Increments a date value by a given time interval, and returns a date

Base SAS Programming – Day 1

The BASE SAS video series begins with the assumption that the student viewer has no background in SAS programming and in fact very limited to no prior exposure to any kind of programming at all. Base SAS video series comprises of 9 video lectures (Average of hour and a half each), plus additional videos covering Advanced topics like PROC SQL and SAS Macros.

Day 1 topics as below —

  1. Intro to Libraries, Data step and Proc step in SAS
  2. Data step example:
    • Creating a sample dataset using Datalines / Cards statement
    • Understanding data types in SAS
    • Informat and Format
    • Label
  3.  Proc step example: Overview
    • Proc Contents – to know the dataset structure, with list of variables and number of observations
      • Varnum option – to list the variables of a dataset in creation order (otherwise the list is in alphabetic order)
    • Proc Print – to view the data in a dataset.
      • Label option and Var statement
    • Proc Means – to extract basic statistics of a numerical variables in a dataset like N, Mean, Standard Deviation, Minimum and Maximum (default)

Time Series using Holt’s Linear Exponential Smoothing (Seasonal Variation)

In this video , we explain how to implement Exponential Smoothing on Excel itself to generate a forecast.

We begin by explaining the decomposition of time series into 4 components

  • Trend (Long Term Progression of the Series)
  • Seasonality
  • Cyclic
  • Irregular/Noise

We then demonstrate the use of Moving averages and single exponential smoothing to extract the trend from the series. By subtracting trend from the original signal we can extract the seasonal variation around the trend.

Further we demonstrate the Holt’s technique for double exponential smoothing in a linear upwards trend and how we can use it for forecasting. Furthermore, by using the length of the season, we average out the seasonal fluctuation around the trend (thereby try to eliminate the irregular component) and then combine the forecasted trend and seasonal fluctuation to get an integrated forecast.

All of the above has been demonstrated using MS Excel and simple formulae, and then we proceed to demonstrate the use of IBM SPSS to do the same.

The worksheet with the implementation can be downloaded from here.

Online Batch on SAS Programming (Base and Advanced)

 

 

Learn-Analytics is starting an online batch on SAS Programming (Base and Advanced) on Saturday, Jan 14th. Classes are scheduled at 2000 IST (1430 GMT, 0930 Eastern), 3 hours a day. For those wishing to register for the training, the first two classes (6 hours of training) will be free to attend and enabling participants to evaluate the trainer as well as the delivery mechanism.

Medium of training will be through Webex, the instructor will take the participants through hands on sessions using datasets and case studies with exercises at the end of each session. Recordings of the session will be made available to all participants post the training for a period of 3 months.

For the detailed modules design and topics covered, click here. Interested candidates can drop us an email at info@learnanalytics.in or fill in the contact form here,  we will forward the webex invitation link for the free evaluation.