Information Value is a widely used statistic in scorecard development, and in data mining in general. I hope you find the numerical example below on Information Value calculation useful.
Information Value is a measure that can be leveraged in order to understand how well an Independent Variable (IV) is able to separate the categories of a Dependent Variable (DV). In other words, it is a means to measure how predictive a variable is in relation to a DV. As such, it can be used in a variable reduction technique. Let us say we are dealing with a dataset with hundreds of variables, IV can help in identifying a sub-set of top predictors in order to speed-up the data mining process and to reduce the number of predictors in a model. In addition, it can help in identifying the optimal binning structure or fine classing of the IVs.
In a scorecard development process, IV formula is as follows:
Where Bads represent defaulted consumers, and Goods non-defaulted (definition of default varies from organization to organization, and is usually defined as 60+ or 90+ days past due payment date). Note that a bin cannot have 0 counts for a category.
Figure 1: Classing example with the help of a Decision Tree in Angoss KnowldgeSTUDIO.
Figure 2: Split Report of the Decision Tree in Figure 1 with the Information Value per variable displayed
In Figure 1, an example is shown of a split on age on a Decision Tree in Angoss KnowledgeSTUDIO. The number of the bads and goods in each category is displayed on each node. In Figure 2 the Split Report of the root node is displayed where the Information Value for age has a value of 0.72268. Table 1 shows a breakdown of this calculation.
Table 1: Node report with a breakdown of the distributions and Information Value
|Size of Group||Number of Status = bad||Number of Status = good||Distr bads||Distr goods||Information Value|
|([age]>=41 AND [age]<=90)||4142||1426||2716||0.605263158||0.355357844||0.133084087|
|([age]>=37 AND [age]<41)||1003||331||672||0.14049236||0.08792359||0.024638192|
|([age]>=33 AND [age]<37)||1096||291||805||0.123514431||0.105325134||0.002897663|
|([age]>=30 AND [age]<33)||773||164||609||0.069609508||0.079680754||0.001360896|
|([age]>=17 AND [age]<30)||2985||144||2841||0.061120543||0.371712678||0.560703642|
For further reading, I strongly recommend the references below.
- Siddiqi, N., 2012. Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons.
- Refaat, M., 2011. Credit Risk Scorecards: Development and Implementation Using SAS. Lulu. com.