From Acoustic Signal to Morphosyntactic Analysis in One End-to-End Neural System
By Lori Levin, Graham Neubig, Shinji Watanabe, and David Mortensen
The proposed research will dramatically transform the landscape of automatic morphosyntactic and morphophonological analysis by introducing an end-to-end system that consumes speech as an input and produces interlinear annotations as an output. The research team proposes to build an end-to-end system, a single neural net that, with small amounts of labeled data produced by native speaker linguists, can directly convert recorded speech to analyzed text, producing four outputs: (1) surface transcription, (2) morphological segmentation of surface forms, (3) an underlying or canonical form for each morpheme, and (4) a gloss or standardized label for each morpheme. The proposed single end-to-end neural network represents the first attempt to integrate the four aforementioned tasks into a single neural network, avoiding the error-propagation problems that have plagued earlier attempts at creating a pipeline and mitigating the complexity of the technology for end-users. The researchers also propose innovative ways to incorporate linguistic knowledge into neural networks, including the use of differentiable weighted finite-state transducers, which are independently motivated by an iterative self-training architecture. This approach to iterative self training, in its own right, will represent an advance in machine learning -- a new algorithm for upweighting words and morphemes. The research also makes significant contributions to computational morphology. It includes a simple but expressive modification to existing schemes for segmentation and glossing, specifically for the representation of discontinuous morphemes. Furthermore, the proposal extends popular approaches to morphological analysis (e.g., UniMorph) by systematically addressing derivation as well as inflection. This proposal addresses glossing of reduplication and noun-incorporation, which earlier work has not.