Posts by: Dr. Mamdouh Refaat

Data Preparation 101 – The Objective of Data Preparation

Published May 2, 2018.

Data preparation is a fundamental aspect of the modeling process. In fact, it is the most important part of the process since it occupies up to 80% of the total time of the project. The objective of data preparation is to prepare what is known as the modeling view or mining view. The modeling view […]

Sampling Methods

Published November 23, 2017.

Why Sampling? Sampling is used to extract a limited number of observations from a large population for the purpose of modeling or analysis. Sampling is primarily done for two reasons: Traditionally, large datasets did not fit into the memory of computers. And when they did, analysis programs ran slowly. Even when the entire data fits […]

Notes on Sampling and Sample Validation

Published October 30, 2017.

The Rationale for Sampling Sampling is used to extract a smaller dataset from a large population of data for the purpose of analysis and modeling. Working on a small dataset, as opposed to the whole population, allows us to use computers with fewer resources, such as RAM and disk space, as well as performing the […]

Variable Types: From the Modeller Point of View

Published September 15, 2017.

Storage vs. Analysis Computer scientists and software developers who designed databases and data management systems, did so with the objective of minimizing disk space and making access to the data easy and fast. For example, Oracle database currently allows 26 field types to select from when defining a table.  On the other hand, from the […]

What does a good decision tree model look like?

Published July 12, 2017.

Decision trees are mainly used, as a predictive model, for two purposes: classification and regression. In classification tasks the purpose is to label the observations with one of a limited number of categories. For example, we want to classify the applications of a credit card into two classes: high risk, and low risk. In regression […]

What makes a good model?

Published June 14, 2017.

There are four main characteristics that can be used to determine the degree of how good a model is. These are 1. Accuracy The accuracy measures how well the model predicts the outcome. When the predictions are close to the actual values (on some validation dataset), the model is deemed to be accurate. Some measures […]

Model Accuracy: Basic Concepts

Published April 26, 2017.

One of the characteristics that a good model should have is to be accurate. In this article, we will discuss what is meant by accuracy and how we define what is meant by an accurate prediction. Let’s first review what we mean when we want our models to be accurate. Accurate models are defined as […]