Data analysis software with Cluster Analysis and Principal Component Analysis

Software

KnowledgeSTUDIO

Bookmark and Share

KnowledgeSTUDIO supports 2 types of unsupervised learning techniques: Cluster Analysis and Principal Component Analysis.

Principal Component Analysis is used in exploratory data analysis and as a means to reduce the dimensionality of the data prior to building predictive models. It reduces large sets of continuous variables to smaller sets of variables.

KnowledgeSTUDIO – Unsupervised LearningCluster Analysis is used in many areas such as market research to discover potentially meaningful groups or clusters within the data set that exhibit internal homogeneity and external (between-cluster) heterogeneity with respect to a predetermined measure.

Principal Component Analysis Highlights
VARIMAX rotation to improve the quality of the components
User-defined cut-off of percent of variance to retain in the reduced set of variables
Full details of eigenvalues, eigenvectors, loadings and scoring equations
Wizard-driven creation of columns representing the new components
Cluster Analysis Highlights
Supported clustering algorithms K-means and Expectation Maximization
Automatic data normalization including transformation of categorical variables into a set of indicator variables
Automatic creation of a range of cluster models with automatic comparison and selection of the best model based on a specific cluster quality criterion
Model comparison criteria: CCC, R2 and Pseudo F statistic
Model results that include the number of observations per cluster, cluster goodness measures and the distance matrix
Cluster Analysis Segment Viewer charts for use in exploring the distribution of variables in the clusters, thus allowing a business description of the cluster model
Cluster description that include the measure of relevance of each variable to different clusters
Direct scoring in KnowledgeSTUDIO and automatic generation of SAS, PMML and XML code for deploying in other analytics environments