Angoss Blog

Angoss Analytics Delivers Intuitive Big Data Mining Solutions

Data Preparation 101 – The Objective of Data Preparation

Published May 2, 2018.

Data preparation is a fundamental aspect of the modeling process. In fact, it is the most important part of the process since it occupies up to 80% of the total time of the project. The objective of data preparation is to prepare what is known as the modeling view or mining view. The modeling view […]

How do I change an Ordinal Variable to a Continuous Variable?

Published March 13, 2018.

Workflow of Decision Trees

Not all ordinal variables can be treated as continuous. If the variable you’re considering is the result of a survey where people listed their satisfaction on a scale of 1 to 5, it doesn’t make sense to try to change this variable to a continuous variable, it is an ordinal variable, and should always be […]

How to Include or Exclude Variables in a Strategy or Decision Tree

Published March 6, 2018.

Workflow of Decision Trees

Including or excluding variables in a strategy or decision tree depends on whether you’re talking about an automatically grown tree, or an interactively grown tree. If you’re growing a decision tree automatically, you can’t force a variable into the tree. You can set which variables are eligible to be used in the tree, but the […]

Angoss gets acquired by Datawatch to offer an all-in-one solution for data scientists

Published February 2, 2018.

Angoss Acquisition by Datawatch Announcement

The entire Angoss team is excited and proud to share that Datawatch Corporation has acquired us! We’re excited because of what this means for Angoss, for Datawatch, and for the entire data science community: A complete end-to-end solution. Datawatch offers self-service data preparation tools, one of which is Datawatch Monarch, their award-winning solution. Their tools […]

Are Decision Trees Secretly Parametric Models? Part 2

Published January 15, 2018.

In the first part of this blog entry, we explored the idea of parametric models, and why this classification might be relevant when thinking about how to make use of Big Data. The astute reader probably noticed, however, that the title of this blog post references the parametricity of decision trees and I didn’t even […]

Are Decision Trees Secretly Parametric Models? Part 1

Published December 11, 2017.

In my living room I have a TV and a couch. Actually, for this story, it’s more important that I tell you I have a TV and a painting. The painting sits above the couch, but that’s neither here nor there. If I want to tell you about my TV, I could simply tell you […]

Sampling Methods

Published November 23, 2017.

Why Sampling? Sampling is used to extract a limited number of observations from a large population for the purpose of modeling or analysis. Sampling is primarily done for two reasons: Traditionally, large datasets did not fit into the memory of computers. And when they did, analysis programs ran slowly. Even when the entire data fits […]

Notes on Sampling and Sample Validation

Published October 30, 2017.

The Rationale for Sampling Sampling is used to extract a smaller dataset from a large population of data for the purpose of analysis and modeling. Working on a small dataset, as opposed to the whole population, allows us to use computers with fewer resources, such as RAM and disk space, as well as performing the […]

Variable Types: From the Modeller Point of View

Published September 15, 2017.

Storage vs. Analysis Computer scientists and software developers who designed databases and data management systems, did so with the objective of minimizing disk space and making access to the data easy and fast. For example, Oracle database currently allows 26 field types to select from when defining a table.  On the other hand, from the […]

Information Value – A Numerical Example

Published August 10, 2017.

Information Value is a widely used statistic in scorecard development, and in data mining in general. I hope you find the numerical example below on Information Value calculation useful. Information Value is a measure that can be leveraged in order to understand how well an Independent Variable (IV) is able to separate the categories of […]