Carnegie Mellon University

Photograph of a microphone, seemingly set up at a podium ahead of a speech

Scalable Spontaneous Speech Dataset (SSSD)

By Shinji Watanabe

The Scalable Spontaneous Speech Dataset (SSSD) project will enable the research community to train expressive models for dialog with semantic, on topic, turn taking. The objective will be to collect a large dataset of spontaneous and casual conversations through a mobile application created by Meta, and sourced to the universities, to train textless conversational language models. We will aim to begin a collection of 100k hours in English and scale to other languages and university partners, with funding and resources allocated towards these partnerships. The benefit of this area of research is that existing large scale datasets are based on text or formal speech and do not address casual and expressive speech, and are not scalable to the majority of languages without large textual resources. This new protocol will open up the possibility of scalable data collection for building models of casual, expressive, and conversational speech for virtually all languages.