Category: Predictive Analytics

Variable Types: From the Modeller Point of View

Published September 15, 2017.

Storage vs. Analysis Computer scientists and software developers who designed databases and data management systems, did so with the objective of minimizing disk space and making access to the data easy and fast. For example, Oracle database currently allows 26 field types to select from when defining a table.  On the other hand, from the […]

What does a good decision tree model look like?

Published July 12, 2017.

Decision trees are mainly used, as a predictive model, for two purposes: classification and regression. In classification tasks the purpose is to label the observations with one of a limited number of categories. For example, we want to classify the applications of a credit card into two classes: high risk, and low risk. In regression […]

What makes a good model?

Published June 14, 2017.

There are four main characteristics that can be used to determine the degree of how good a model is. These are 1. Accuracy The accuracy measures how well the model predicts the outcome. When the predictions are close to the actual values (on some validation dataset), the model is deemed to be accurate. Some measures […]

Model Accuracy: Basic Concepts

Published April 26, 2017.

One of the characteristics that a good model should have is to be accurate. In this article, we will discuss what is meant by accuracy and how we define what is meant by an accurate prediction. Let’s first review what we mean when we want our models to be accurate. Accurate models are defined as […]

Reject Inference for Application Scorecards

Published March 29, 2017.

Financial institutions rely on credit scoring models to assess the risks associated with granting credit. In particular, application scorecards are commonly used as decision support mechanisms for customer acquisition and are developed based on approved applicants. Declined applicants are not included in the modeling exercise, which makes sense because their performance is not known. However, […]

Data Prep, no shortcuts to good Modelling

Published March 22, 2017.

Data Preparation is the backbone of any analysis and many varied data preparation procedures are available to access and shape data into an appropriate representation for modelling or reporting. Data Preparation is, as those who are involved will attest, a time consuming task. An array of figures have been quoted to reflect the proportion of […]

Bankruptcy Scores

Published March 1, 2017.

Most of us have heard about risk scores whether we are seeking to rent a property or applying for a loan. A risk score measures the likelihood of a customer defaulting within a certain time frame. It acts as a tool to help in understanding someone’s probability of missing payments and eventually ending up in […]

Variable Reduction Best Practices

Published February 21, 2017.

Nowadays, in the data mining world, having too much data has become a more prevailing problem than not having enough. Building a predictive model on all available variables can be a time consuming task, one that will take a long time to compute and becomes less robust and harder to interpret. So, what can we […]

Statistical Power

Published February 14, 2017.

Introduction What is statistical power? Sometimes called the sensitivity of a hypothesis test, statistical power describes the ability for a statistical test to identify whether the effect it is trying to find or measure exists or not. If you’re wondering how a test designed to measure a statistical effect might be unable to measure that […]