KnowledgeSTUDIO supports 2 types of unsupervised learning techniques: Cluster Analysis and Principal Component Analysis.
Principal Component Analysis is used in exploratory data analysis and as a means to reduce the dimensionality of the data prior to building predictive models. It reduces large sets of continuous variables to smaller sets of variables.
Cluster Analysis is used in many areas such as market research to discover potentially meaningful groups or clusters within the data set that exhibit internal homogeneity and external (between-cluster) heterogeneity with respect to a predetermined measure.
| Principal Component Analysis Highlights |
| VARIMAX rotation to improve the quality of the components |
| User-defined cut-off of percent of variance to retain in the reduced set of variables |
| Full details of eigenvalues, eigenvectors, loadings and scoring equations |
| Wizard-driven creation of columns representing the new components |
| Cluster Analysis Highlights |
| Supported clustering algorithms K-means and Expectation Maximization |
| Automatic data normalization including transformation of categorical variables into a set of indicator variables |
| Automatic creation of a range of cluster models with automatic comparison and selection of the best model based on a specific cluster quality criterion |
| Model comparison criteria: CCC, R2 and Pseudo F statistic |
| Model results that include the number of observations per cluster, cluster goodness measures and the distance matrix |
| Cluster Analysis Segment Viewer charts for use in exploring the distribution of variables in the clusters, thus allowing a business description of the cluster model |
| Cluster description that include the measure of relevance of each variable to different clusters |
| Direct scoring in KnowledgeSTUDIO and automatic generation of SAS, PMML and XML code for deploying in other analytics environments |

