The LTI Summer Seminar

Welcome to the LTI Summer Seminar 2020, a series of talks about research done by students at the Language Technologies Institute.

Presentations to be conducted via Zoom:

Presentations will take place weekly on Thursdays,
9-10 am EDT (Pittsburgh)
6-7 am PDT (Silicon Valley)
9-10 pm China Standard Time (Beijing)
6:30-7:30 pm India Standard Time (Hyderabad)
2-3 pm British Summer Time (London)
3-4 pm Central European Summer Time (Berlin)

Slide decks from past presentations can be found here.

The current schedule of presentations is as follows:

June 18, 2020

Zhuyun Dai | Towards Deeper Language Understanding in Large-Scale Information Retrieval

Information retrieval (IR) systems face large document collections and diverse language patterns. It brings challenges for using advanced NLP techniques in retrieval.  In this talk, I will share our efforts in using deep neural networks to improve language understanding in IR.  First, I will share our work in applying Transformers to generate context-aware bag-of-words document representations. These context-aware representations significantly boost retrieval accuracy while the simple bag-of-words ensures efficiency. Next, I will share our recent explorations in complementing bag-of-words with machine-learned embeddings, and show how our method is used in a zero-shot setting to help people access COVID-19 articles.

Zhuyun Dai is a sixth-year Ph.D. student at LTI, advised by Jamie Callan. Her research is at the intersection of information retrieval, deep learning, and natural language processing.  She received a BS degree in computer science from Peking University, and an MS degree from LTI, CMU.

Qizhe Xie | Self-training with Noisy Student improves ImageNet classification

We present Noisy Student Training, a semi-supervised learning approach that works well even when labeled data is abundant. Noisy Student Training achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2. Noisy Student Training extends the idea of self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. On ImageNet, we first train an EfficientNet model on labeled images and use it as a teacher to generate pseudo labels for 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the learning of the student, we inject noise such as dropout, stochastic depth, and data augmentation via RandAugment to the student so that the student generalizes better than the teacher. 

Qizhe Xie is a PhD student at Carnegie Mellon University, advised by Eduard Hovy. His research interests include Deep Learning, Natural Language Processing and Computer Vision. He worked at Google Brain as a student researcher for two years under the guidance of Quoc Le (The inventor of Seq2Seq and AutoML).

June 25, 2020

Shikib Mehri | Evaluating Open-Domain Dialog

It is important to define meaningful and interpretable automatic evaluation metrics for open-domain dialog research. Standard language generation metrics have been shown to be ineffective for dialog. In this talk, I will discuss two recent publications: (1) USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation at ACL 2020 and (2) Unsupervised Evaluation of Interactive Dialog with DialoGPT at SIGdial 2020. These two papers introduce the USR and FED metrics respectively. USR is an unsupervised reference-free metric for response generation is shown to correlate strongly with human judgment on both Topical-Chat and PersonaChat. FED is an automatic evaluation metric that (1) does not rely on a ground-truth response, (2) does not require training data and (3) measures fine-grained dialog qualities at both the turn and whole dialog levels. FED attains moderate to strong correlation with human judgement at both levels.

Shikib is in his second year of the MLT program (soon to be PhD) advised by Maxine Eskenazi. Shikib's research covers a wide range of topics in the realm of dialog systems including dialog generation, retrieval, representation learning, evaluation and data collection.

Zhiqing Sun | An EM Approach to Non-autoregressive Conditional Sequence Generation

Autoregressive (AR) models have been the dominating approach to conditional sequence generation, but are suffering from the issue of high inference latency. Non-autoregressive (NAR) models have been recently proposed to reduce the latency by generating all output tokens in parallel but could only achieve inferior accuracy compared to their autoregressive counterparts, primarily due to a difficulty in dealing with the multi-modality in sequence generation. This paper proposes a new approach that jointly optimizes both AR and NAR models in a unified Expectation-Maximization (EM) framework. In the E-step, an AR model learns to approximate the regularized posterior of the NAR model. In the M-step, the NAR model is updated on the new posterior and selects the training examples for the next AR model. This iterative process can effectively guide the system to remove the multi-modality in the output sequences and remedy the multi-modality problem. To our knowledge, this is the first EM approach to NAR sequence generation. We evaluate our method on the task of machine translation. Experimental results on benchmark data sets show that the proposed approach achieves competitive, if not better, performance with existing NAR models and significantly reduces the inference latency.

Zhiqing Sun is an MLT student in LTI, CMU. Currently, he's supervised by Professor Yiming Yang, working on sequence models. Before that, He received a B.S. in Computer Science from Peking University, advised by Prof. Zhi-Hong Deng. His research interest is machine learning in general, as well as the applications on knowledge graphs, sequence models, machine translation, and natural language understanding.

July 2, 2020

Junjie Hu | XTREME: A Massively Multilingual Multi-task Benchmarkfor Evaluating Cross-lingual Generalization

Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing. To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages and 9 tasks. We demonstrate that while models tested on English reach human performance on many tasks, there is still a sizable gap in the performance of cross-lingually transferred models, particularly on syntactic and sentence retrieval tasks. There is also a wide spread of results across languages. We release the benchmark at The codes used for downloading data and training baseline models are available at to encourage research on cross-lingual learning methods that transfer linguistic knowledge across a diverse and representative set of languages and tasks.

Junjie Hu is a PhD student in Language Technologies Institute, at Carnegie Mellon University (CMU), working with the late Jaime Carbonell and Graham Neubig. His research focuses on indirect supervision for natural language generation and understanding over diverse domains, languages, and modalities. In particular, he works on developing learning algorithms for machine translation, cross-lingual adaptation, and multi-modal language generation.

Xuezhe Ma | FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flows

Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens. In contrast, non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs. However, directly modeling the joint distribution of all tokens simultaneously is challenging, and even with increasingly complex model structures accuracy lags significantly behind autoregressive models. In this paper, we propose a simple, efficient, and effective model for non-autoregressive sequence generation using latent variable models. Specifically, we turn to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables. We evaluate this model on three neural machine translation (NMT) benchmark datasets, achieving comparable performance with state-of-the-art non-autoregressive NMT models and almost constant decoding time w.r.t the sequence length.

Xuezhe Ma recently received his PhD degree from Language Technologies Institute at Carnegie Mellon University, where he worked with Eduard Hovy. Before that, he received his B.E and M.S from Shanghai Jiao Tong University. His research interests fall in areas of natural language processing and machine learning, particularly in deep learning and representation learning with applications to linguistic structured prediction and deep generative models. Xuezhe has interned at Allen Institute for Artificial Intelligence (AI2) and earned the AI2 Outstanding Intern award. His research has been recognized with outstanding paper award at ACL 2016 and best demo paper nomination at ACL 2019.

July 9, 2020

Chirag Nagpal | Interpretable Subgroup Discovery in Treatment Effect Estimation with Application to Opioid Prescribing Guidelines.

The dearth of prescribing guidelines for physicians is one key driver of the current opioid epidemic in the United States. In this work, we analyze medical and pharmaceutical claims data to draw insights on characteristics of patients who are more prone to adverse outcomes after an initial synthetic opioid prescription. Toward this end, we propose a generative model that allows discovery from observational data of subgroups that demonstrate an enhanced or diminished causal effect due to treatment. Our approach models these sub-populations as a mixture distribution, using sparsity to enhance interpretability, while jointly learning nonlinear predictors of the potential outcomes to better adjust for confounding. The approach leads to human-interpretable insights on discovered subgroups, improving the practical utility for decision support. Paper Link:

Chirag Nagpal is a 2nd Year PhD student (+MLT) student at LTI research Machine Learning in Healthcare. His interests include Graphical Models and their applications in Survival Analysis, Causal Inference and Uncertainty Estimation. During his PhD, he has been a Science for Social Good Fellow at IBM Research and a Summer Associate in JPMorgan AI Research. This summer he is remotely interning at Google Brain and Google Healt.

Shikhar Vashishth | Improving Medical Entity Linking with Semantic Type Prediction

Medical entity linking is the task of identifying and standardizing medical concepts referred to in an unstructured text. Most of the existing methods adopt a three-step approach of (1) detecting mentions, (2) generating a list of candidate concepts, and finally (3) picking the best concept among them. In this paper, we probe into alleviating the problem of overgeneration of candidate concepts in the candidate generation module, the most under-studied component of medical entity linking. For this, we present MedType, a fully modular system that prunes out irrelevant candidate concepts based on the predicted semantic type of an entity mention. We incorporate MedType into five off-the-shelf toolkits for medical entity linking and demonstrate that it consistently improves entity linking performance across several benchmark datasets. To address the dearth of annotated training data for medical entity linking, we present WikiMed and PubMedDS, two large-scale medical entity linking datasets, and demonstrate that pre-training MedType on these datasets further improves entity linking performance. We make our source code and datasets publicly available for medical entity linking research.

Shikhar Vashishth is a Postdoctoral Researcher at Language Technologies Institute, Carnegie Mellon University. Currently, working in the field of biomedical natural language processing under Prof. Carolyn Rose. Previously, he completed his Ph.D. from the Indian Insitute of Science under the guidance of Partha Pratim Talukdar, Chiranjib Bhattacharyya, and Manaal Faruqui. His thesis topic was on Neural Graph Embedding Methods for Natural Language Processing. Shikhar has been a recipient of the prestigious Google Ph.D. Fellowship and has interned at Google Research and Microsoft. He completed his graduation from BITS Pilani, Pilani in 2016. 

July 16, 2020

Haohan Wang | High Frequency Component Helps Explain the Generalization of Convolutional Neural Networks

We investigate the relationship between the frequency spectrum of image data and the generalization behavior of convolutional neural networks (CNN). We first notice CNN's ability in capturing the high-frequency components of images. These high-frequency components are almost imperceptible to a human. Thus the observation leads to multiple hypotheses that are related to the generalization behaviors of CNN, including a potential explanation for adversarial examples, a discussion of CNN's trade-off between robustness and accuracy, and some evidence in understanding training heuristics.

Haohan Wang is a PhD student at LTI, CMU, working with Professor Eric P. Xing; his research interest centers around the robustness issues of machine learning, covering a spectrum of applications including computer vision, NLP, and computational biology.

Shrimai Prabhumoye | Controllable Text Generation: Should machines reflect the way humans interact in society?

The 21st century is witnessing a major shift in the way people interact with technology and Natural Language Generation (NLG) is playing a central role. Users of smartphones and smart home devices now expect their gadgets to be aware of their situation, and to produce natural language outputs in interactions. ‘We identify three aspects of human communication to make machines sound human-like - style, content and structure. This talk provides deep learning solutions to controlling these variables in neural text generation. This talk first outlines the various modules which could be manipulated to perform effective controllable text generation. It also provides a novel solution for style transfer using back-translation. Furthermore, it introduces two new tasks to leverage information from unstructured documents into the dialogue response generation process and Wikipedia edit generation process. At the end, it provides a discussion on the ethical considerations of the applications of controllable text generation.

Shrimai Prabhumoye is a PhD student at Carnegie Mellon University working with Prof. Alan W Black and Prof. Ruslan Salakhutdinov. Her thesis focuses on controlling style, content and structure in text generation and the ethical considerations of this technology. She was also the co-designer of the Computational Ethics for NLP course offered at CMU. During her masters she was leading the CMU Magnus team in the Alexaprize competition in 2017.

July 23, 2020

Khyathi Chandu | Dissecting the Components and Factors of Text Generation

Over the past couple of decades, research in text generation has been transformational from working in niche constrained scenarios to more routinely accessible applications including our virtual assistants. We are reaching for the dream of making machines interact like humans and never have we been closer to making this a reality. To foster this, the various underlying components and modeling paradigms of text generation that have far-reaching impacts cutting across several tasks like summarization, image captioning, dialog, storytelling etc., need our attention. This talk presents an overview of these current trends in techniques of text generation along with some key challenges that pave ways to prospective and viable future directions.

Khyathi Chandu is a fourth year PhD student in LTI, advised by Alan Black. Her current research is focussed on text generation, primarily on long form narratives with visual and textual inputs by addressing trio of content, structure and surface form realization. She has also worked in question answering and code-switching. Prior to joining CMU, she did her Bachelors from IIIT Hyderabad.

Dongyeop Kang | Linguistically Informed Language Generation

Natural language generation (NLG) is a key component of many language applications such as dialogue systems, question answering systems, and textual summarization. However, they are yet far behind human-like or human-level generation. This is because a multitude of implicit information is NOT explicitly obvious on the surface. We call the kinds of information as facets, that are reflected in variations of a language, such as external knowledge, intents, interpersonal information, and more. To generate human-like utterances, appropriate modeling of these facets is necessary, and the system needs to be effectively guided by them. Based on Halliday’s Systemic Functional Linguistics (SFL) theory (1978), my thesis focuses on three facet groups; knowledge, structure, and style, and presents effective computational methods for handling each facet in a wide range of generation tasks. As the intersection of computational linguistics and human-computer interaction, during my postdoc, I study developing a human-machine collaborative generation system in a mixed-initiative way.

Dongyeop Kang is a postdoctoral scholar at University of California, Berkeley, under Marti A. Hearst. He obtained his Ph.D. in the Language Technologies Institute of the School of Computer Science at Carnegie Mellon University, under Eduard Hovy. He interned at Facebook AI, Allen Institute for AI (AI2), and Microsoft Research. His Ph.D. study has been supported by Allen Institute for AI (AI2) Fellowship, CMU Presidential Fellowship, and ILJU Graduate Fellowship.

July 30, 2020

Paul Michel | Weight Poisoning Attacks on Pre-trained Models

Recently, NLP has seen a surge in the usage of large pre-trained models. Users download weights of models pre-trained on large datasets, then fine-tune the weights on a task of their choice. This raises the question of whether downloading untrusted pre-trained weights can pose a security threat. In this paper, we show that it is possible to construct ``weight poisoning'' attacks where pre-trained weights are injected with vulnerabilities that expose ``backdoors'' after fine-tuning, enabling the attacker to manipulate the model prediction simply by injecting an arbitrary keyword. We show that by applying a regularization method, which we call RIPPLe, and an initialization procedure, which we call Embedding Surgery, such attacks are possible even with limited knowledge of the dataset and fine-tuning procedure. Our experiments on sentiment classification, toxicity detection, and spam detection show that this attack is widely applicable and poses a serious threat. Finally, we outline practical defenses against such attacks.

Paul Michel is a 4th year PhD student in the Language Technologies Institute at Carnegie Mellon University, advised by Graham Neubig. His research interests are in NLP and machine learning, with a focus on learning in the face of distributional shift.

Junwei Liang | The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction

In this talk, I will present our latest work at CVPR 2020, multi-future trajectory prediction. In this work, we study the problem of predicting the distribution over multiple possible future paths of people as they move through various visual scenes. We make two main contributions. The first contribution is a new dataset, created in a realistic 3D simulator, which is based on real world trajectory data, and then extrapolated by human annotators to achieve different latent goals. This provides the first benchmark for quantitative evaluation of the models to predict multi-future trajectories. Pedestrian trajectory prediction has been an important research direction for traffic safety and self-driving applications. Project page:

Junwei Liang is a third-year Ph.D. student at LTI, advised by Prof. Alexander Hauptmann. He received the master of language technologies degree from LTI in 2017. His research interests include large-scale computer vision, natural language processing and machine learning. His recent work on person future prediction attracts wide attention on Github and from Chinese media. His work on video event reconstruction and shooter localization have received much media coverage including from CBS.

August 6, 2020

Volkan Cirik | Refer360°: A Referring Expression Recognition Dataset in 360° Images

In this talk, we introduce a dataset, Refer360°, for the task of localizing a target in 360° scenes given a sequence of instructions. Refer360° consists of 17,137 instruction sequences and ground-truth actions for completing these instructions in 360° scenes. Refer360° differs from existing related datasets in three ways. First, we propose a more realistic scenario where instructors and the followers have partial, yet dynamic, views of the scene – followers continuously modify their field-of-view(FoV) while interpreting instructions that specify a final target location. Second, instructions to find the target location consist of multiple steps for followers who will start at random FoVs. As a result, intermediate instructions are strongly grounded in object references and followers must identify intermediate FoVs to find the final target location correctly. Third, the target locations are neither restricted to predefined objects nor chosen by annotators; instead, they are distributed randomly across scenes. This “point anywhere” approach leads to more linguistically complex instructions, as shown in our analyses. Our examination of the dataset shows that Refer360° manifests linguistically rich phenomena in a language grounding task that poses novel challenges for computational modeling of language, vision, and navigation.

Volkan is a Ph.D. student at LTI working with Louis-Philippe Morency and Taylor Berg-Kirkpatrick. His research is at the intersection of natural language processing and computer vision. He received a BS degree from Bogazici University, and MSc degrees from Koc University and LTI at CMU. He likes to read, run, and cook in no particular order.

Elijah Mayfield | Confronting Inequitable Use of Language Technologies in Education

Amid the coronavirus pandemic and social justice protests of 2020, it's not always easy to draw a straight line between outside events and technical topics in natural language processing. Research has tended to focus on audits of classifier fairness and bias in word embeddings, looking primarily inward at the specifics of model behavior and learned parameters. This talk presents interdisciplinary work from artificial intelligence in education, taking a critical look at the role NLP technologies play in supporting or harming students, teachers, and parents. I'll discuss the literature that NLP researchers can benefit from for thinking about broader implications and questions for our technical work, with specific examples of how it is used in schools and universities.

Elijah is a Ph.D. candidate at LTI, adjunct faculty in Social Policy & Practice at the University of Pennsylvania, and advisor to GSV Ventures, an investment firm focused on education technology. Previously, he was Vice President of New Technologies at Turnitin, leading their research in machine learning and NLP for 30 million students worldwide, and CEO of LightSide Labs, a CMU startup that was acquired in 2014 after support from the Bill & Melinda Gates Foundation, US Department of Education, and the College Board.

August 13, 2020

Hira Dhamyal | Detecting Gender Differences in Perception of Emotion in Crowdsourced Data

Do men and women perceive emotions differently? Popular convictions place women as more emotionally perceptive than men. Empirical findings, however, remain inconclusive. Most prior studies focus on visual modalities. In addition, almost all of the studies are limited to experiments within controlled environments. Generalizability and scalability of these studies has not been sufficiently established. We study the differences in perception of emotion between genders from speech data in the wild, annotated through crowdsourcing. While we limit ourselves to a single modality (i.e. speech),our framework is applicable to studies of emotion perception from all such loosely annotated data in general. We addresses multiple serious challenges related to making statistically viable conclusions from crowdsourced data. Overall we aim to establish a reliable novel framework for perceptual studies from crowdsourced data and secondly we aim to study statistically the differences in speech-based emotion perception between genders.

Hira Dhamyal is 2nd year Master student at the Language Technologies Institute, Carnegie Mellon University. Her research interests includes multiple topics in speech and has worked on topics including speaker verification, emotion identification and emotion expression and perception.