Carnegie Mellon University


Information Retrieval, Text Mining and Analytics

By Jamie Callan

Lemur is a collaboration between researchers at the LTI and the University of Massachusetts to provide state-of-the-art software, datasets and search services that support research by a broad, international community. Indri and Galago are extensible open-source search engines that provide powerful query languages; state-of-the-art retrieval models; indexing support for metadata, text annotations, and multiple text representations; and indexes capable of storing more than a billion documents. ClueWeb09 and ClueWeb12 are among the largest and most widely used web document collections. Each year, dozens of papers at the leading IR conferences report on research conducted using Lemur's software, datasets and services.