Accuracy and Diversity in Ensembles of Text Categorisers

Authors

  • Juan Jose Garcıa Adeva School of Electrical and Information Engineering University of Sydney
  • Ulises Cervino Beresi Instituto de F ́ısica Rosario 2000 Rosario,
  • Rafael A. Calvo School of Electrical and Information Engineering University of Sydney

DOI:

https://doi.org/10.19153/cleiej.8.2.1

Abstract

Error-Correcting Out Codes (ECOC) ensembles of binary classifiers are used in Text Cate- gorisation to improve the accuracy while benefiting from learning algorithms that only support two classes. An accurate ensemble relies on the quality of its corresponding decomposition ma- trix, which at the same time depends on the separation between the categories and the diversity of the dichotomies representing the binary classifiers. Important open questions include finding a good definition for diversity between two dichotomies and a way of combining all the pairwise diversity values into a single indicator that we call the decomposition quality. In this work we introduce a new measure to estimate the diversity between two learners and we compare it to the well-known Hamming distance. We also examine three functions to evaluate the decomposition quality. We present a set of experiments where these measures and functions are tested using two distinct document corpora with several configurations in each. The analysis of the results shows a weak relationship between the ensemble accuracy and its diversity.

Downloads

Published

2005-12-01