Validating cluster structures in data mining tasks

You can use the Mining Accuracy Chart tab of Data Mining Designer in SQL Server Data Tools (SSDT) to compare the predictive accuracy of the mining models in your mining structure.

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).

I have had the same question you have, and have some (as of yet not completely read) references are relevant: A Survey on Internal Validity Measure for Cluster Validation L. I say this because how well a particular unsupervised method performs will largely depend on why one is doing unsupervised learning in the first place, i.e., does the method perform well in the context of your end goal?

validating cluster structures in data mining tasks-79

A good resource (with references) for clustering is sklearn's documentation page, Clustering Performance Evaluation.

This covers several method, but all but one, the Silhouette Coefficient, assumes ground truth labels are available.

is a standard tool in analytics and is an important feature for helping you develop and fine-tune data mining models.

You use cross-validation after you have created a mining structure and related mining models to ascertain the validity of the model.

Cross-validation has the following applications: You can customize the way that cross-validation works to control the number of cross-sections, the models that are tested, and the accuracy bar for predictions.

If you use the cross-validation stored procedures, you can also specify the data set that is used for validating the models.When you specify the number of partitions, you determine how many temporary models will be created.For each partition, a cross-section of the data is flagged for use as the test set, and a new model is created by training on the remaining data not in the partition.Obviously this isn't completely true, people work on these problems and publish results which include some sort of evaluation.I'll outline a few of the approaches I'm familiar with below.

