Data Preparation 101 – The Objective of Data Preparation

Published May 2, 2018.

Data preparation is a fundamental aspect of the modeling process. In fact, it is the most important part of the process since it occupies up to 80% of the total time of the project. The objective of data preparation is to prepare what is known as the modeling view or mining view. The modeling view […]

How to Include or Exclude Variables in a Strategy or Decision Tree

Published March 6, 2018.

Workflow of Decision Trees

Including or excluding variables in a strategy or decision tree depends on whether you’re talking about an automatically grown tree, or an interactively grown tree. If you’re growing a decision tree automatically, you can’t force a variable into the tree. You can set which variables are eligible to be used in the tree, but the […]

Are Decision Trees Secretly Parametric Models? Part 1

Published December 11, 2017.

In my living room I have a TV and a couch. Actually, for this story, it’s more important that I tell you I have a TV and a painting. The painting sits above the couch, but that’s neither here nor there. If I want to tell you about my TV, I could simply tell you […]

Sampling Methods

Published November 23, 2017.

Why Sampling? Sampling is used to extract a limited number of observations from a large population for the purpose of modeling or analysis. Sampling is primarily done for two reasons: Traditionally, large datasets did not fit into the memory of computers. And when they did, analysis programs ran slowly. Even when the entire data fits […]

Notes on Sampling and Sample Validation

Published October 30, 2017.

The Rationale for Sampling Sampling is used to extract a smaller dataset from a large population of data for the purpose of analysis and modeling. Working on a small dataset, as opposed to the whole population, allows us to use computers with fewer resources, such as RAM and disk space, as well as performing the […]

Information Value – A Numerical Example

Published August 10, 2017.

Information Value is a widely used statistic in scorecard development, and in data mining in general. I hope you find the numerical example below on Information Value calculation useful. Information Value is a measure that can be leveraged in order to understand how well an Independent Variable (IV) is able to separate the categories of […]