Angoss Blog

Angoss Analytics Delivers Intuitive Big Data Mining Solutions

Are Decision Trees Secretly Parametric Models? Part 2

Published January 15, 2018.

In the first part of this blog entry, we explored the idea of parametric models, and why this classification might be relevant when thinking about how to make use of Big Data. The astute reader probably noticed, however, that the title of this blog post references the parametricity of decision trees and I didn’t even […]

Are Decision Trees Secretly Parametric Models? Part 1

Published December 11, 2017.

In my living room I have a TV and a couch. Actually, for this story, it’s more important that I tell you I have a TV and a painting. The painting sits above the couch, but that’s neither here nor there. If I want to tell you about my TV, I could simply tell you […]

Sampling Methods

Published November 23, 2017.

Why Sampling? Sampling is used to extract a limited number of observations from a large population for the purpose of modeling or analysis. Sampling is primarily done for two reasons: Traditionally, large datasets did not fit into the memory of computers. And when they did, analysis programs ran slowly. Even when the entire data fits […]

Notes on Sampling and Sample Validation

Published October 30, 2017.

The Rationale for Sampling Sampling is used to extract a smaller dataset from a large population of data for the purpose of analysis and modeling. Working on a small dataset, as opposed to the whole population, allows us to use computers with fewer resources, such as RAM and disk space, as well as performing the […]

Variable Types: From the Modeller Point of View

Published September 15, 2017.

Storage vs. Analysis Computer scientists and software developers who designed databases and data management systems, did so with the objective of minimizing disk space and making access to the data easy and fast. For example, Oracle database currently allows 26 field types to select from when defining a table.  On the other hand, from the […]

Information Value – A Numerical Example

Published August 10, 2017.

Information Value is a widely used statistic in scorecard development, and in data mining in general. I hope you find the numerical example below on Information Value calculation useful. Information Value is a measure that can be leveraged in order to understand how well an Independent Variable (IV) is able to separate the categories of […]

What does a good decision tree model look like?

Published July 12, 2017.

Decision trees are mainly used, as a predictive model, for two purposes: classification and regression. In classification tasks the purpose is to label the observations with one of a limited number of categories. For example, we want to classify the applications of a credit card into two classes: high risk, and low risk. In regression […]

What makes a good model?

Published June 14, 2017.

There are four main characteristics that can be used to determine the degree of how good a model is. These are 1. Accuracy The accuracy measures how well the model predicts the outcome. When the predictions are close to the actual values (on some validation dataset), the model is deemed to be accurate. Some measures […]

Model Accuracy: Basic Concepts

Published April 26, 2017.

One of the characteristics that a good model should have is to be accurate. In this article, we will discuss what is meant by accuracy and how we define what is meant by an accurate prediction. Let’s first review what we mean when we want our models to be accurate. Accurate models are defined as […]