Score cut-offs can blow up in your face

Posted to Linkedin at https://www.linkedin.com/today/post/article/20140609165942-5425117-score-cut-offs-can-blow-up-in-your-face
Risk scores are extremely powerful tools in determining the final disposition of credit applications. Typically scores are used in a consumer lending scenario – but can be in a commercial environment as well (SME segment).
Most scores would include variables encompassing application details, bureau variables (including a generic bureau derived score – for e.g. a FICO Score, or Vantage Score – or in India the CIBIL TransUnion Score) and internal bank variables if the customer already has a relationship with the bank. In absence of a specialized application score – the generic bureau score can also be used to grade the applications.
Operationally, the scores can be used to give Yes/No decisions to customer applications – though in some scenarios; scores on the margins can be referred or decisions on partial exposures taken.
Most Banks/Financial Institutions will calibrate the scores using extensive analysis to identify the Odds or Bad rate at score bands. A business specific bad rate definition can be used here – for e.g. 2, or 3 missed payments in next 12 months (i.e. the loan going bad in a fixed time period post loan sanction). This calibration can be done by retrospective analysis of applications in the past – and their performance post sanction. (Assumption being that the patterns of past will propagate into the future without too much variance – macro or otherwise). Basis the retro analysis – a score cut-off is identified which allows the bank to target a specific bad rate. The score cut off also forces a rejection rate on incoming applications.
In order to illustrate impact of score cut-offs on bad rates – I am going to assume the score has been calibrated to grade incoming applications on a normal distribution with a mean score of 600 and a standard deviation of 50 points. Additionally the score has been equalized at an anchor of 600 with a PDO (points to double the odds) at 25 points. (There is no fixed rule that a scorecard needs to be centred at the mean/median point – done for illustration purpose only). Odds at the centre are calibrated at 1/70 or roughly 1.42% of customers with score of 600 will go delinquent on their loan.
Below graphic and table gives the score distribution of a 10,000 applicants and the bad rate by score bands.

PIC1

The below table gives the bad rate by score cut-offs for the same population –>

Score Band % Bad Rate
No cutoff+ 2.6%
450+ 2.5%
475+ 2.3%
500+ 1.9%
525+ 1.5%
550+ 1.1%
575+ 0.7%
600+ 0.4%
625+ 0.2%
650+ 0.1%
675+ 0.1%
700+ 0.0%
725+ 0.0%
750+ 0.0%

The table essentially tells as that out of the 10000 odd customers – the expected bad rate if the bank approves everybody is 2.6%. i.e. no cutoff – we get an approval rate of 100% and a bad rate of 2.6%.

As evident- there is a trade off between approval rate and the expected bad rate – in order to reach our target bad rate of 1.5 % – the  table can be referred to identify 525 as a potential score cut-off.

That is banks can continue to approve applications to score bands which in isolation may be considered high risk, but pooled with a larger number of customers in higher bands, still maintain the overall portfolio bad rate. And why would a bank lend to customers in say the 550-599 band when it clearly has an elevated bad rate – there can be a multitude of reasons – capturing market share, approval rate pressures and sales targets – you name it. After all, sub- prime customers are the most profitable, as long as we can predict the bad rates; and have a pool of good customers to balance them out. Sub-prime customers are theoretically charged a higher interest rate which is supposed to take care of the extra risk the bank is taking.

So now by enforcing a cut-off of 525 on incoming applications (instead of 575) – we get an approval rate of approximately 93%. (Calculating area under curve of a normal distribution with a known mean and std deviation). i.e. approximately 7 % of incoming applications will be deemed as high risk and rejected – and approved population will have a target bad rate of 1.5%. Now with a 93% approval rate, both the risk and sales teams are happy! Or are they?

Let the Bad times roll

 

One major weakness of using score cut-offs is the long list of assumptions inherent in score building and deployment process. Even slight deviations from these assumptions can have a disproportionate impact on the risk exposure of the bank.

One of the most critical assumptions is around the probability distribution of the applications. Score cut offs  are calculated based on studying the past distributions (they need not be  normal), as in the case of the example being discussed – based on the chart above – a cutoff of 525 gives an approval rate of 93% and a bad rate of 1.5%.  If the distribution remains stable – the cutoff can give a predictable bad rate and can be controlled and the bank can confidently lend to subprime customers as well, thus cornering market share as well as a much healthier interest spread while relying confidently relying on their ‘Million Dollar Statistical Model’.

However, take the scenario of worsening macro-economic situations (not unlike witnessed in 2008), or a new sourcing channel opening up. A distribution shift can happen for any number of reasons – and even slight deviations can have a large impact.

For e.g. let’s assume the distribution of incoming applications left shifts to a mean of 580 (from 600 previously). The std  deviation and PDO remaining constant – the table below gives the impact on the bad rates based on different cut offs now –>

PIC2

The above figure shows the new application distribution as compared to the original.

Assuming the score anchor and PDO remains unchanged, based on the new incoming application distribution – we see the shift in the score based cutoffs. Previously the score cutoff at 525 gave a bad rate of 1.5%. However when the applicants mean shifts to 580 from 600 originally, the same score cutoff of 525 now gives a bad rate of 2.1%. (an increase of more than 30% in the bad rate!), and that’s not all – the approval rate has now fallen to 86% – a rejection rate of 14%.

 

Score Band % Bad Rate (New Dist.)
No Cutoff 4.7%
450+ 4.2%
475+ 3.6%
500+ 2.9%
525+ 2.1%
550+ 1.3%
575+ 0.8%
600+ 0.5%
625+ 0.3%
650+ 0.1%
675+ 0.1%
700+ 0.0%
725+ 0.0%
750+ 0.0%

The reason for this is due to the distribution shifting slightly to the left, % of applicants in higher score bands go down, these customers were supposed to drive the portfolio bad rates down – but now, the % of customers sourced in the not so good score bands shoot up (but hey, we didn’t compromise on the score cutoffs did we?).

The sales team is now hopping mad with rejection rates having more than doubled from before; and risk team is under pressure – even after rejecting so many applications – the bad rates are shooting up!

A cursory look at the re-calculated bad rates on the updated distribution shows that the score cutoff needs to be revised to 550 from 525 to maintain the same bad rate as before. The actual approval rate needs to be 72%!

This illustrates how a small shift in the incoming population would need the risk team to quickly revise the score cutoff to bring the approval rate down from 93% earlier to 72% now just to maintain the target bad rate.

What this essentially means is that the risk exposure of the bank has suddenly shot up, the subprime customers have actually not been priced correctly on this model now, the interest rate calculation did not take into consideration this particular scenario. The bank continues to source on the new distribution  – confident that the score will continue to perform (which it is – just not as assumed).

It may not end here, when macro-economic parameters deteriorate, worsening credit quality of incoming customers as discussed above is one symptom, the other impact happens on the credit scores itself. Scores built or calibrated on ‘good times’ will almost certainly begin to wander when the ‘bad times’ come in. The score odds are not set in stone and do change based on how the industry is performing.

In ‘bad times’ deterioration of the odds ratio itself at score intervals can be expected as many banks found out in 2008. (FICO faced some heat for this), however the basic purpose of the score still holds irrespective – which is to rank order customers from highest risk to lowest risk. In a case where macroeconomic parameters impact individual behaviour – any score would need to be recalibrated to capture new behaviour. The basic presumption of past behaviour propagating into future is invalidated here as behaviour is now changing rapidly.

For our example where the score was anchored at 600 with an odds of 70 to 1 and PDO of 25, lets assume a deterioration of odds to 60 to 1 with the PDO unchanged. The new interval bad rate table is as below (capped at 99% for lowest interval) –>

 

Score Band % New Bad Rate Original Bad Rate
<450 99.0% 91.4%
450-474 53.3% 45.7%
475-499 26.7% 22.9%
500-524 13.3% 11.4%
525-549 6.7% 5.7%
550-574 3.3% 2.9%
575-599 1.7% 1.4%
600-624 0.8% 0.7%
625-649 0.4% 0.4%
650-674 0.2% 0.2%
675-699 0.1% 0.1%
700-724 0.1% 0.0%
725-749 0.0% 0.0%
750+ 0.0% 0.0%

The difference may not look very high, but let’s explore what happens when we combine this new data with our update probability distribution for the cutoff bad rates.

Score Band % Bad Rate (New Dist.) % Bad Rate (Old Dist.)
No cutoff 5.4% 2.6%
450+ 4.9% 2.5%
475+ 4.2% 2.3%
500+ 3.3% 1.9%
525+ 2.4% 1.5%
550+ 1.6% 1.1%
575+ 1.0% 0.7%
600+ 0.5% 0.4%
625+ 0.3% 0.2%
650+ 0.2% 0.1%
675+ 0.1% 0.1%
700+ 0.0% 0.0%
725+ 0.0% 0.0%
750+ 0.0% 0.0%

Based on previous cutoff of 525 – post a odds and a population shift, the actual new bad rate faced by the bank is 2.4% instead of the expected 1.5%, i.e. the bad rate has suddenly spiked up by 60%.

To compensate for this, the score cut off actually needs to be revised significantly north of 550, with an approval rate of even lesser than the 72% when the odds had not shifted.

Both factors; a population shift alongwith odds change can deliver a double whammy to the risk team of any bank. There are practical problems a risk team will face in convincing the sales head that the approval rate needs to be cut down to less than 70% from 94% earlier because of the small matter of score mean shifting by 20 points (on a score scale which ranges from 400 to 800) and a odds shift to 60 to 1 from 70 to 1.

 

While the scores continue to do their job of ranking customers, reliance on pure cutoffs by banks can be suicidal and invalidate the scorecard needlessly. Like any other tool a scorecard is also only as good as the risk manager behind it. If a risk manager does not have the authority or freedom as a case in point in this example to cut approval rates down to 70% from 94%, cutoffs will simply not work. In fact quite the opposite, enforcing a score cutoff can be spectacularly counterproductive.

While the illustration discussed above is fairly simplistic and with assumptions which are unlikely to present themselves so neatly in the real world, the scenario discussed has unfortunately replicated itself in many banks and lenders throughout the world.

Raghuram Rajan (ex IMF chief economist and current RBI governor) talks about a conference he attended in his book ‘Fault Lines’ where he is addressing a group of risk managers (a while before 2008 happened) about tail risk and its possible impact; the talk is not well received by the audience and then someone pulls him aside and tells him that the risk managers who could understand and push what he was saying inside their bank’s had long since been fired for being Cassandra’s. The whole concept of tail risk is that while probability of the event happening is low, but when it does happen – they wipe out all profits accumulated over the so called good times. The concept that pricing the risk to lenders exposing themselves to subprime can be modelled out is inherently faulty; and while using scores – you may be able to generate handsome profits over years and years – tail risk is actually much higher than what our models estimate.

Time Series using Holt’s Linear Exponential Smoothing (Seasonal Variation)

In this video , we explain how to implement Exponential Smoothing on Excel itself to generate a forecast.

We begin by explaining the decomposition of time series into 4 components

  • Trend (Long Term Progression of the Series)
  • Seasonality
  • Cyclic
  • Irregular/Noise

We then demonstrate the use of Moving averages and single exponential smoothing to extract the trend from the series. By subtracting trend from the original signal we can extract the seasonal variation around the trend.

Further we demonstrate the Holt’s technique for double exponential smoothing in a linear upwards trend and how we can use it for forecasting. Furthermore, by using the length of the season, we average out the seasonal fluctuation around the trend (thereby try to eliminate the irregular component) and then combine the forecasted trend and seasonal fluctuation to get an integrated forecast.

All of the above has been demonstrated using MS Excel and simple formulae, and then we proceed to demonstrate the use of IBM SPSS to do the same.

The worksheet with the implementation can be downloaded from here.

Demo on Time Series using Exponential Smoothing (IBM SPSS and Excel)

Following up on our last week’s Webex session on Logistic Regression for credit scoring (you can catch it here), this Sunday we will demonstrate the technique of exponential smoothing in time series forecasting.

More specifically, during the webinar we will take you through the basic decomposition of a time series into its components:

  • Trend
  • Seasonality
  • Cyclic
  • Error

We will concentrate on extracting and forecasting the trend and seasonality of a series. Trend component can be extracted using Exponential smoothing (Single, double and triple depending on the slope and pattern) and building a seasonal index to forecast the seasonal variation.

The webinar will demonstrate how this can be done using simple formulas on an excel sheet itself and then introduce the time series function in IBM SPSS. A basic introduction to Box-Jenkins (ARIMA) modelling will also be covered.

A trial version of IBM SPSS version 20 can be downloaded from here. (You will need to fill in some details). Those interested in attending the seminar can drop an email at info@learnanalytics.in or fill in the form at http://www.learnanalytics.in/contact.html  and we will mail you the webex invite. The webinar is scheduled on Sunday, 15th Jan, 0930 IST – 0400 GMT.

The webinar is free to attend.

To receive regular updates, please join our linkedin group Learn Analytics.

 

Installing Rattle and R

The halo around R continues to grow and grow, more and more organizations are now beginning to explore building capabilities in R programming as it can potentially deliver costs savings. More on the comparison of R and SAS in our earlier blog entry.

In this post we will take you through installation of R and Rattle on a Windows 7 machine. Here is a youtube video showing the capabilities of R on a small credit scoring dataset.

  1. Download R from the website. The link provides for Windows installation, the setup file for both 32 bit and 64 bit systems is the same, so you need not worry.
  2. The setup file is an executable, simply run it and follow the instructions, it should install the basic R software on your system.
  3. There should be an icon created on your desktop, in 64 bit systems two icons get created (one for normal 32 bit, the other for 64 bit). If you have a 64 bit system, double click on the Rx64 2.XX icon, where XX is the version number)
  4. The following window should open upR software interfacetype in the following commands one after the other, press enter after each statement   install.packages(“RGtk2”) & install.packages(“rattle”). After the first command, a window will open up asking for a CRAN mirror to be  selected as below, You can select any CRAN mirror to download the packages from (to be safe, select any US or western Europe mirror to ensure latest versions)
  5. Run the following commands now » library(rattle) followed by rattle()
  6. This is where most errors regarding rattle installation pop up, in a lot of cases R will thrown an error such as GTK not found or error with GTK+ and it will offer to download GTK for you. But even that option after download will not work. Fear not, follow the instructions below to resolve, if your Rattle window launches, congratulations, its working
  7. For those with GTK problems follow the below bullet point steps
  • 32 Bit systems open this link, 64 bit systems open this link.
  • On the page scroll down to GTK+ packages and select GTK+ Version 2.24.8 (32 bit Runtime); GTK+ Version 2.22.1 (64 bit- Binaries)
  • Copy it to the C drive root and extract the ZIP files as they are. For e.g. I create a folder C:\gtk+_2.22.1-1_win64
  • Now Right click on My Computer and then click on Properties (Alternatively you can go via Control Panel >System & Security>System), a new window will open up, on the left hand side click on “Advanced system settings”
  • A new window as below will open up

  • Click on Environment Variables near the bottom, a new window will again pop up, within the system variables selection, scroll down to path and click on edit.
  • An “Edit System Variable” window will open up with variable name “Path”, within variable values you will see a number of Folder paths separated by a semi colon.
  • Within the variable values go the beginning and add a path to the GTK folder we had extracted to the Bin folder, for e.g. C:\gtk+_2.22.1-1_win64\bin followed by a semi colon. (Note: make sure your path actually exists in the folder you have extracted into, i.e. the bin folder)
  • Close all and restart the R software
  • Type in library(rattle), press enter followed by rattle()
  • The rattle window should now open up, you are now ready to shake, rattle and roll your data. Install all packages which Rattle prompts you to, it will be done automatically after you press ok. Check out our Rattle demonstration post for a flavor of what Rattle can do.

 

Do let us know if the post was helpful in solving your Rattle installation issues, especially the pesky GTK/RGTK2 error. Feel free to comment even if you still face installation issues, we will try and solve them!

LearnAnalytics Team.

How to enter the Analytics Industry?

We have been in the field of Analytics training for over 4 years now (current and previous organization) and have trained personally over 1000 students in both SAS programming as well as advanced analytics including both retail and corporate clients.

One of the most repeated queries I field from my students is “How do I get in?” or “How do I convince an Analytics company to hire me ?” or “I have 10 years experience in so-and-so industry, how do I make a switch to Analytics?” . If only I had a rupee for everytime I was asked this question, I guess I could have retired by now! (or maybe 10 Rs/question).

Well, there is no singular answer or approach to enter the industry, off-campus freshers face a tortuous task in breaking in, to get hired you need to have experience in Analytics and to get experience in Analytics, you need to be already hired somewhere. Its an age old challenge.

Back in the day, all technical trades were controlled by guilds (for e.g. carpentry, masonry or blacksmiths) which acted both as facilitators to the chosen and entry barriers to the upstarts. To enter the field, a young man (or person) would have to grovel before an established artisan to get an unpaid apprenticeship in return for lodging and food. This was generally unpaid labor but in the bargain the apprentice gained experience and the artisan free labor. After a few years the apprentice would be granted membership of the guild and be free to setup on his own.

Jump to the 21st century and transplant this to Analytics, How do YOU break in ? The challenge a 1000 years after the establishment of guilds remains the same, to be hired you need experience under your belt and to be experienced you need to get hired.

A few clarifications for people trying to break in

  • First off, there is no formal qualification or degree required to be an Analytics professional. (You dont need fancy Maths/Stats/Engg degrees, I have seen arts graduates become Subject Matter Experts in collections analytics).
  • Secondly, there is no age limit, 40 year olds have made the jump and done extremely well.
  • Its a job seekers market, provided you have reached the magic figure of 12 months experience.

Experience in analytics is King today, people who have experience can literally dictate hiring terms.

But how to get that initial experience? Therein lies the heartbreak, though there is a way, just like it was thousands of years ago, apprenticeships (we call them internships now), you have to convince a company, any analytics company to hire you either at a very low salary or even no salary in the beginning. Face it, you need them more than they need you at this stage. Anything to get that valuable CV line about experience in. This is typically a call which freshers just out of college are able to take easily. But for those coming with previous industry experience will find difficult to make the jump. Whether to make this jump or not is a decision you have to take.

I have seen 30 year olds leave stable jobs to start in Analytics at Rs 12,000 /month salaries. A year later they are already at their previous levels. 32 year housewife who took up a SAS programming course was offered a 10,000/month contract for 3 months at a small analytics company, she is now a middle level manager in the analytics arm of a major MNC. The pattern is evident, do whatever it takes to get working once you have acquired a few skills; whether through some training courses or self study.

Analytics companies do not care about your educational qualification, formal background, if you have prior analytics experience, you are a rare commodity and you will be snapped up.

That said, what are the skills that one needs to even get a foot in the door. I have one word for that – “SAS”. SAS programming jobs are probably one of the easiest ways to get a foothold in the industry today. SAS certifications (exam costs some USD 200, quite cheap for the benefits it provides) on your CV can act as a substitute for SAS programming experience. The companies will treat you as a known commodity if you have cleared the certification exams and will increase your chances of shortlisting.(I will insert a disclaimer here – since our organization specializes in SAS training for certification, this opinion piece may be considered biased to convince the reader to enroll for SAS training, that is not the objective, I am merely stating an observation)

Secondly, there are SAS jobs and then there are SAS Analytics jobs. One mistake that people can make is to take an initially higher paying pure SAS programming job over a SAS Analytics profile. Candidates need to be very aware of the nature of work they are getting into, a company which offers work in predictive modeling or data mining using SAS should always be preferred over a pure SAS programming job. A mistake here could mean a career of reporting and ad hoc requests versus definitely more glamorous side of Predictive modeling. Beg, borrow, steal, kill or even pay, but get experience of predictive modeling under your belt. A small difference in the beginning but over 30 year career can mean totally divergent paths.

A note here: Typically startups are more likely to hire people based on attitude/aptitude rather than a CV. They provide the best opportunities for people who really want to break in.

In sum :

  • Study a lot, it takes time to master any new technical skill, typically to reach an employable skill level in SAS and basic analytics will require upto 500 hours of training, practice and self study. (Target 4 hours a day over a period of 3-4 months)
  • Be prepared to spend time in the trenches, you have to be mentally ready to take a salary cut, maybe a huge one to get that elusive experience initially. (Target internships and startups here, have a target of 8-12 months experience under your belt before you start looking around after this)
  • Make intelligent choices , in your career, even a 1 week project in predictive modeling using say regression could make all the difference.(Beware of pure SAS programming jobs, they may only be on the reporting side, keep trying to gain experience in data modeling projects)
  • Every big MNC has an analytics arm, if you are already working in such a company, pull all strings to get into an allied project which you can leverage for experience, or an internal transfer. (I know a guy who was in the BPO arm of a major MNC, he bugged his reporting chain for 1.5 years before they finally relented, today he is travelling all over the world as an SCM analytics consultant!)
  • Those who play it like they have nothing to lose are the ones who win big, bring your attitude with you
Do let us know if you found this post helpful, for any queries regarding analytics careers or analytics training drop in a mail at info@learnanalytics.in or check out our website www.learnanalytics.in
We are interested in learning how you made the jump into the analytics industry, drop us a note in the comments section for the other readers.

R-Rattle Training Video

Today, we are going to introduce a very powerful data mining tool called Rattle. Interesting feature of Rattle is that it is a GUI which sits on top of R. What it means is that it gives users a point and click interface to build data mining projects, predictive Models etc without writing a single line of R code.

In the featured video we have built various predictive models on a credit scoring dataset and compared their performances against each other using ROC curves. Models built are –>

  • Decision Trees
  • Random Forests
  • Adaptive Boosting
  • Support Vector Machines
  • Logistic Regression
  • Neural Networks

This was done without writing any R code (except to launch rattle). Total video lenght is about 17 minutes, which will take you through data import in rattle, variable exploration, model building and model evaluation using ROC’s.

This video is for people from an advanced analytics background as we have not explained much of the methodologies behind the techniques, merely how to do in Rattle. Those who can understand the methodology and are not working in the analytics industry, you should immediately jump ship, greener pastures are awaiting (Seriously, if you understand even 40% of this, you cannot be unemployed!)

For those, who want to understand and learn stuff shown on the video, check out our website www.learnanalytics.in, we specialize in Analytics Training for students worldwide. We provide SAS, R , Advanced Analytics trainings.

For doubts/queries, batch timings, drop in  a mail to info@learnanalytics.in

  1. Click here to download R
  2. Click here to download Rattle
  3. Click here to download the dataset discussed in the video

To install rattle, simply follow the instructions on the website linked above, if you have problems in installing,drop us a mail, we will be glad to help you out. We will be following up on a detailed post on R and rattle installation with troubleshooting.

Drop in comments to give us feedback!!

Learn Analytics Team