Category: Advanced Analytics

Are Decision Trees Secretly Parametric Models? Part 1

Published December 11, 2017.

In my living room I have a TV and a couch. Actually, for this story, it’s more important that I tell you I have a TV and a painting. The painting sits above the couch, but that’s neither here nor there. If I want to tell you about my TV, I could simply tell you […]

Sampling Methods

Published November 23, 2017.

Why Sampling? Sampling is used to extract a limited number of observations from a large population for the purpose of modeling or analysis. Sampling is primarily done for two reasons: Traditionally, large datasets did not fit into the memory of computers. And when they did, analysis programs ran slowly. Even when the entire data fits […]

Notes on Sampling and Sample Validation

Published October 30, 2017.

The Rationale for Sampling Sampling is used to extract a smaller dataset from a large population of data for the purpose of analysis and modeling. Working on a small dataset, as opposed to the whole population, allows us to use computers with fewer resources, such as RAM and disk space, as well as performing the […]

Information Value – A Numerical Example

Published August 10, 2017.

Information Value is a widely used statistic in scorecard development, and in data mining in general. I hope you find the numerical example below on Information Value calculation useful. Information Value is a measure that can be leveraged in order to understand how well an Independent Variable (IV) is able to separate the categories of […]