Cluster 2A – Chu-Yi Chang

Clustering is an unsupervised machine learning technique that groups similar objects into the same cluster. Cluster 2A combines the two most popular clustering algorithms, K-means and DBSCAN, to help you discover interesting patterns in the data. For example, it can help you find different customer groups based on customer consumption behavior.

Cluster 2A contains four tools:

Growth Data:

Add the growth rate data you want to analyze.

For example, you can do cluster analysis based on the monthly purchases of VIP customers. Cluster 2A will automatically calculate the customer’s purchase growth rate and purchase volatility, and make clustering recommendations.

Feature Data:

Add the feature value data you want to analyze.

For example, you can do cluster analysis based on the main purchase features of customers: days since last purchase (Recency), purchase frequency (Frequency), and purchase amount (Monetary).

K-means Model:

The K-means algorithm requires the number of clusters to be specified. Its main goal is to find representative data points in a large amount of high-dimensional data, which are called centroids, and then assign the nearest centroid to each data point based on these centroids. It can be well extended to a large number of samples and has been widely used in many different fields. Cluster 2A uses K-means++ to select the initial cluster center to improve the convergence speed.


Unlike K-means, DBSCAN does not need to specify the number of clusters to be generated. The DBSCAN algorithm processes data points based on density, mainly dividing sufficiently dense points in the feature space into the same cluster, and can identify outliers that do not belong to any cluster, which is very suitable for detecting outliers.