Μεταπηδήστε στο περιεχόμενο

Condensed Nearest Neighbour Rules for Multi-Label Datasets

Publication in IDEAS ’23: Proceedings of the International Database Engineered Applications Symposium Conference

Reducing the size of the training set, that is, replacing it with a condensing set, while maintaining the classification accuracy as much as possible is a very common practice to speed up instance-based classifiers. Data reduction techniques, also known as prototype selection or generation algorithms, can be used to accomplish this. There are numerous such algorithms that can be found in the literature that are effective for single-label classification problems, but the majority of them cannot be used for multi-label data where an instance may belong to multiple classes. Due to the numerous binary condensing sets it creates, the well-known Binary Relevance transformation method cannot be combined with a Data Reduction algorithm. Condensed Nearest Neighbor is a well-known parameter-free single-label prototype selection algorithm. This study proposes three variations of that algorithm for training datasets with multiple labels. An experimental study that we conducted over nine distinct datasets shows that our three proposed approaches provide good reduction rates while not tampering with the classification rates.