Abstract: The wording choices made by writers convey much about their assumptions about the world and their attitudes towards the subject matter. How writers refer to people is only a small part of a complex language generation problem but remains indicative of the writer's assumptions and attitudes. Consider the following headline: "MIT Professor Esther Duflo and Husband Win Nobel Prize", and what the writer must have assumed about a reader's possible familiarity with the researcher or about the relative importance of the achievements by the two award recipients .
In this talk, I will describe our past work on automatically predicting if writers assume the people they refer to are known by the intended reader (or not) and if they treat the person as important in the story or not. The prediction can be made accurately enough to improve the flow and informativeness of automatic summarizers using the distinction. We then turn to a discussion of how that work can be scaled for modern data-hungry approaches and more complex references, such as nominal (husband) and pronominal (he) references to people. I will present an error analysis of named entity recognition and coreference systems that limit our ability to create large training data for the task by automated means, as well as some steps towards improving these aspects of language technology.
BIO: Ani Nenkova is an associate professor of computer and information science at the University of Pennsylvania. Her main areas of research are computational linguistics and artificial intelligence, with emphasis on developing computational methods for analysis of text quality and style, discourse, affect recognition and summarization. She obtained her PhD degree in computer science from Columbia University.
Ani and her collaborators are recipients of the best student paper award at SIGDial in 2010 and best paper award at EMNLP-CoNLL in 2012. The Penn team co-led by Ani won the audio-visual emotion recognition challenge (AVEC) for word-level prediction in 2012.
Ani is a co-editor-in-chief of the Transactions of the Association for Computational Linguistics (TACL). She was a member of the editorial board of Computational Linguistics (2009--2011) and an associate editor for the IEEE/ACM Transactions on Audio, Speech and Language Processing (2015--2018). She regularly serves as an area chair/senior program committee member for ACL, NAACL and AAAI. Ani was a program co-chair for SIGDial 2014 and NAACL-HLT in 2016.