Information Value – A Numerical Example

August 10, 2017.

by – Posted in Advanced Analytics

Information Value is a widely used statistic in scorecard development, and in data mining in general. I hope you find the numerical example below on Information Value calculation useful.

Information Value is a measure that can be leveraged in order to understand how well an Independent Variable (IV) is able to separate the categories of a Dependent Variable (DV). In other words, it is a means to measure how predictive a variable is in relation to a DV. As such, it can be used in a variable reduction technique. Let us say we are dealing with a dataset with hundreds of variables, IV can help in identifying a sub-set of top predictors in order to speed-up the data mining process and to reduce the number of predictors in a model. In addition, it can help in identifying the optimal binning structure or fine classing of the IVs.

In a scorecard development process, IV formula is as follows:

Where Bads represent defaulted consumers, and Goods non-defaulted (definition of default varies from organization to organization, and is usually defined as 60+ or 90+ days past due payment date). Note that a bin cannot have 0 counts for a category.

Figure 1: Classing example with the help of a Decision Tree in Angoss KnowldgeSTUDIO.

 

Figure 2: Split Report of the Decision Tree in Figure 1 with the Information Value per variable displayed

 

In Figure 1, an example is shown of a split on age on a Decision Tree in Angoss KnowledgeSTUDIO. The number of the bads and goods in each category is displayed on each node. In Figure 2 the Split Report of the root node is displayed where the Information Value for age has a value of 0.72268. Table 1 shows a breakdown of this calculation.

 

Table 1: Node report with a breakdown of the distributions and Information Value

Size of Group Number of Status = bad Number of Status = good Distr bads Distr goods Information Value
([age]>=41 AND [age]<=90) 4142 1426 2716 0.605263158 0.355357844 0.133084087
([age]>=37 AND [age]<41) 1003 331 672 0.14049236 0.08792359 0.024638192
([age]>=33 AND [age]<37) 1096 291 805 0.123514431 0.105325134 0.002897663
([age]>=30 AND [age]<33) 773 164 609 0.069609508 0.079680754 0.001360896
([age]>=17 AND [age]<30) 2985 144 2841 0.061120543 0.371712678 0.560703642
2356 7643 0.72268448

For further reading, I strongly recommend the references below.

References:

  1. Siddiqi, N., 2012. Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). John Wiley & Sons.
  2. Refaat, M., 2011. Credit Risk Scorecards: Development and Implementation Using SAS. Lulu. com.

Leave a Reply

Your email address will not be published. Required fields are marked *