Leveraging High-Resource Languages to Improve Low-Resource Language Processing

2019 BU Linguistics Colloquium Series

Gina-Anne Levow, Associate Professor, Linguistics, University of Washington.

When:
Monday, November 25, 2019
4:00pm-5:00pm

Where:
Hariri Institute for Computing, Seminar Room MCS 157, 111 Cummington Mall, Boston, MA

Abstract:
Recent years have seen dramatic strides in automatic speech and language processing, ranging from automatic speech recognition to machine translation. While these advances have benefited from improvements in machine learning algorithms, they are crucially dependent upon increases in processing power and especially on huge corpora of language data for training and tuning of models. As a result, these language processing systems are accessible only to the few hundred most-resourced languages and remain largely out of reach for the other over six thousand languages of the world, most of which are low-resource or endangered. To bridge this gap, this talk explores approaches to leverage linguistic resources from higher-resource languages to improve effectiveness on language processing tasks. We find that careful integration of within-language resources with selected high-density language resources can enable rapid development and better generalization of both machine translation and spoken language processing capabilities for low-resource languages.

Co-sponsored by the BU Department of Computer Science and the Hariri Institute for Computing. We also gratefully acknowledge support from the Office of the Associate Dean for the Humanities in the College of Arts & Sciences.

Bio:
Gina-Anne’s research concentrates on the use of intonation in spoken dialog, and her interests range over natural language processing, spoken language systems, and human-computer interfaces.

She is currently collaborating with Prof. Richard Wright and Prof. Mari Ostendorf on the NSF-funded ATAROS project to develop techniques to model and automatically recognizes stance-taking in dyadic conversational speech. She is also working with Prof. Helen Meng and Prof. Patrick Wong of the Chinese University of Hong Kong on a project funded by the Hong Kong Research Grants Council to investigate the use of articulatory distinctive features in the analysis and assessment of dysarthria, a neuro-physical speech disorder, in Cantonese. With Prof. Emily Bender, she is the principal investigator of the NSF-funded EL-STEC project to develop shared tasks in speech and natural language processing that will facilitate research on and documentation of endangered languages. Many of her prior projects have aimed to interpret meaning transmitted through prosody. Her NSF-funded project “Learning Tone” used a contextual model employing minimally supervised machine learning techniques to recognize lexical tones in Mandarin, Cantonese, isiZulu, and isiXhosa as well as prominence in English. Another NSF-funded project investigated automatic recognition of lexical and prosodic cues to conversational social dynamics, such as turn-taking and backchannels, across Arabic, English, and Spanish.

She also has long-standing interests in information retrieval in text and speech across a range of languages, in domains from news to medicine. She has participated in projects and numerous shared tasks in cross-language and spoken document retrieval, focusing on general techniques for rapidly retargeting to other languages as well as specialized approaches for the Chinese-English language pair. She recently developed a system for the “Similar Segments in Social Speech” task at Mediaeval 2013, clustering related spans from informal video chat interactions.

She received her Ph.D. from the Massachusetts Institute of Technology in 1998. Her doctoral thesis explored recognizing spoken corrections in human-computer dialogue, relying on acoustic-prosodic features. Her Master’s thesis examined discourse-neutral prosodic phrasing in Mandarin Chinese, analyzing the relationship between syntactic and prosodic structure.