Back to articles
Volume: 14 | Article ID: art00029
Using a Large Set of Weak Classifiers for Text Analytics
  DOI :  10.2352/issn.2168-3204.2017.1.0.146  Published OnlineMay 2017

TF*IDF is a common approach used for text mining and information retrieval. We have described a method for using 112 variations on the TF*IDF equation for the classification of 588 CNN news articles belonging to 12 different classes. We found that no single TF*IDF could accurately classify all the documents. In fact, the highest accuracy attainable by any single TF*IDF was 45%. In this article, we take the work further to show how different measurements utilizing the TF*IDF classification results can be used to show that some classes may be logically inconsistent as classes. These methods also may be used to create more cohesive classes.

Subject Areas :
Views 9
Downloads 1
 articleview.views 9
 articleview.downloads 1
  Cite this article 

Steven J. Simske, A. Marie Vans, "Using a Large Set of Weak Classifiers for Text Analyticsin Proc. IS&T Archiving 2017,  2017,  pp 146 - 151,

 Copy citation
  Copyright statement 
Copyright © Society for Imaging Science and Technology 2017
Archiving Conference
Society for Imaging Science and Technology