Cluster
The Cluster
skill lets you divide groups of abstract data into classes of similar data. Unlike [Train
], the Cluster
skill doesn't require a labeled dataset.
Format
Cluster
has a single format with several variations: Cluster data (excluding | including) <columns> (setting number of clusters to <number of clusters>) (using <model>)
.
Parameters
Cluster
uses the following parameters:
columns
(optional). The columns to exclude or include in your cluster. Not selecting a column will use all columns in the dataset to cluster data.number of clusters
(optional). The number of clusters to create.model
(optional). Using the Hierarchical Clustering Model or the KMeans Clustering Model. Not selecting a model will use both to cluster data.
Output
If data is successfully clustered, a preview message appears in the chat history. This message contains a link to preview the ClusterResults dataset. An output also appears in the visualization panel displaying the Cluster Centroids, Models, Scores, and Pipeline Report.
Examples
To cluster the "Titanic Dataset" in it's entirety, enter Cluster data
.
To cluster the Age, Fare, and Gender columns, enter Cluster data including Age, Fare, Gender setting number of clusters to 3 using Hierarchical Clustering Model
.