Condensing multi-label data based on Clustering

A common approach to speed up the instance-based classifiers, while preserving accuracy, is to use a subset of the original training set. This process, known as condensing, reduces the training dataset
to a smaller, representative set. This is accomplished by employing a data reduction technique that selects the most relevant training instances. While many of these techniques have been successfully
applied to single-label classification, most are not appropriate for multi-label data, where instances can be associated with multiple classes. This paper proposes a simple data reduction technique for
multi-label datasets utilizing 𝐾-means++ clustering. The proposed method selects instances near cluster centroids to form the condensing set, utilizing 𝐾-means++’s initialization strategy to achieve
widely spread initial centroids. Experiments conducted on nine datasets, combined with a statistical test, show that our approach achieves significant data reduction while maintaining high classifi-
cation accuracy.

Link: https://doi.org/10.1145/3716554.3716591