Carnegie Mellon University

Large-Scale Hierarchical Classification

Information Retrieval, Text Mining and Analytics

By Yiming Yang

Using classification to provide organizational views of large data has become increasingly important in the Big-Data era. For instance, Wikipedia articles are indexed using more than 600,000 categories in a dependency graph. Jointly optimizing all the classifiers in such a large graph or hierarchy presents significant challenges for structured learning. Our research develops new statistical learning frameworks and scalable algorithms that solve joint optimization problems with more than one trillion model parameters to produce state-of-the art effectiveness in international benchmark evaluations for large-scale classification.