Semantic Descriptions Based on Detected Objects, Attributes and Actions
Multimodal Computing and Interaction
By Alexander Hauptmann
While current research into automatic generation of semantic descriptions centers mainly on improving detection accuracy for individual concepts, this project focuses on generating reliable, semantically meaningful descriptions and creating summaries of key aspects of the data given inaccurate/noisy results from the component detectors. We propose a general dual noisy channel model to detect the most salient attributes/concepts in a scene and convert them into a semantic description of the video clip. We approach this through a novel semi-supervised feature analysis framework for image and video annotation for objects and scenes. We will also develop algorithms that automatically detect the most salient attributes/concepts from a video sequence. Finally, the semantic description concept output will feed into a statistical generative language model to summarize the detected salient concepts and other evidence in the data to automatically generate so a semantically meaningful event description.