Accuracy and Diversity in Ensembles of Text Categorisers
DOI:
https://doi.org/10.19153/cleiej.8.2.1Abstract
Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in Text Cate- gorisation to improve the accuracy while benefiting from learning algorithms that only support two classes. An accurate ensemble relies on the quality of its corresponding decomposition ma- trix, which at the same time depends on the separation between the categories and the diversity of the dichotomies representing the binary classifiers. Important open questions include finding a good definition for diversity between two dichotomies and a way of combining all the pairwise diversity values into a single indicator that we call the decomposition quality. In this work we introduce a new measure to estimate the diversity between two learners and we compare it to the well-known Hamming distance. We also examine three functions to evaluate the decomposition quality. We present a set of experiments where these measures and functions are tested using two distinct document corpora with several configurations in each. The analysis of the results shows a weak relationship between the ensemble accuracy and its diversity.
Downloads
Published
Issue
Section
License
CLEIej is supported by its home institution, CLEI, and by the contribution of the Latin American and international researchers community, and it does not apply any author charges whatsoever for submitting and publishing. Since its creation in 1998, all contents are made publicly accesibly. The current license being applied is a (CC)-BY license (effective October 2015; between 2011 and 2015 a (CC)-BY-NC license was used).