TF*IDF is a common approach used for text mining and information retrieval. We have described a method for using 112 variations on the TF*IDF equation for the classification of 588 CNN news articles belonging to 12 different classes. We found that no single TF*IDF could accurately classify all the documents. In fact, the highest accuracy attainable by any single TF*IDF was 45%. In this article, we take the work further to show how different measurements utilizing the TF*IDF classification results can be used to show that some classes may be logically inconsistent as classes. These methods also may be used to create more cohesive classes.
Steven J. Simske, A. Marie Vans, "Using a Large Set of Weak Classifiers for Text Analytics" in Proc. IS&T Archiving 2017, 2017, pp 146 - 151, https://doi.org/10.2352/issn.2168-3204.2017.1.0.146