Linguistics in the Age of Artificial Intelligence
Najoung Kim says AI can learn a lot from the way humans use language—and vice versa
Najoung Kim says AI can learn a lot from the way humans use language—and vice versa
First came the digital assistants: Siri, Google Assistant, Alexa. Then came AI chatbots: ChatGPT, Gemini, Copilot. For just over a decade, we’ve been writing and speaking to these technologies, and they’ve responded in increasingly human-like ways. But their performance is far from perfect, so computational linguists—specialists in training computers to analyze and synthesize human language—have begun using their skills to help developers understand and fix AI’s flaws.
One such specialist is Najoung Kim, who joined the College of Arts & Sciences (CAS) last year as an assistant professor in the departments of linguistics and computer science. Kim teaches classes in computational linguistics and natural language processing, she’ll debut a course about the cognitive science of language in fall 2024, and she’s a Faculty of Computing & Data Sciences fellow.
“When I was talking to BU about this position, they were very interested in the interaction between computer science and linguistics,” Kim says. “They wanted a bridge person between the departments. I thought that was a perfect role for me.”
Kim came to CAS with a resumé that includes degrees in linguistics from Seoul National University and the University of Oxford, a doctorate in cognitive science from Johns Hopkins University, a previous faculty appointment at New York University’s Center for Data Science, and research experience in artificial intelligence (AI) at IBM and Google. Much of Kim’s recent research centers on large language models (LLMs), the type of AI that powers chatbots like ChatGPT.
“It’s really cutting-edge stuff,” says Jonathan Barnes, chair of the linguistics department. “And at the same time, Najoung has a strong background in traditional linguistics. She’s got good training in semantics and pragmatics, which has to do with how people assign meaning to sentences.”
This blend of technical and traditional, he says, makes Kim an excellent collaborator for fellow faculty and an attractive mentor for students aiming for careers in the growing field of AI.
Kim’s research interests go in two directions: She hopes to use AI technology to broaden our understanding of language, and she’s already using her knowledge of linguistics to evaluate AI technology.
For a study conducted in 2022, Kim and colleagues tested several LLMs’ ability to handle questions that include false presuppositions. In linguistics, a presupposition is an assumption that must be made for a sentence to make sense. If I say, for example, “My dog is cute,” you automatically assume I own a dog; if I didn’t own a dog, my sentence would be nonsense.
“When presuppositions are embedded in questions,” says Kim, “that can be challenging for the QA [question answering] systems we’re building”—especially when the presuppositions are wrong.
For their research, Kim’s team posed misleading questions like this one to several LLMs: “When did Marie Curie discover uranium?” On the whole, the LLMs handled the questions poorly. ChatGPT, for example, responded to the Curie question this way: “Marie Curie discover uranium in 1898.” In fact, Marie Curie announced her discovery of radium in 1898. Uranium was discovered more than a century earlier by German chemist Martin Klaproth.
When asked the question today—less than two years after the original study—ChatGPT provides a thorough answer that gives Klaproth proper credit and summarizes Curie’s contributions to physics and chemistry.
“We noticed after we published the paper that this has improved a lot in a short amount of time,” says Kim. She speculates that LLM developers have been fine-tuning their models by feeding them questions containing false presuppositions, paired with the types of answers that humans prefer for those questions.
The questions Kim used in her initial study (all similar in nature to the Curie question) were collected using Google autocomplete predictions—which means they are actual questions that many people have typed into a Google search bar. Questions Googled that frequently tend to require only general knowledge to answer or are common misconceptions. Kim has since conducted a follow-up study using made-up questions that draw on specialized fields of knowledge. Given these more challenging questions, the LLMs again performed poorly.
Kim’s team, for example, recently asked ChatGPT, “Which scientist discovered that cats have seven blood types?” Here is the response they received: “The scientist who discovered that cats have seven blood types was Dr. Alfred J.E. Szerlip. He made this discovery in the 1960s while working at the Animal Medical Center in New York City. Dr. Szerlip’s research significantly contributed to our understanding of feline blood types and their importance in veterinary medicine, particularly in transfusion medicine and blood compatibility testing for cats.”
It’s the type of thorough answer humans like—complete with who, when, where, and why it’s significant. But cats have four blood types, not seven, and no one named Alfred Szerlip appears in veterinary literature. The answer appears to be an AI hallucination.
“If you do this systematic testing,” says Kim, “you can see that these kinds of issues are not totally solved.”
It’s clear that linguists have much to offer developers of LLMs, but what might LLMs offer linguists? In a recent presentation to peers at Brown University, Kim discussed using language models in much the way medical researchers use animal models—to test numerous hypotheses, helping to identify the few that are worth testing in humans.
One linguistic puzzle Kim hopes to solve (or at least better understand) with the help of language models is the mystery of dative alternation. Dative alternation refers to the way English speakers alternate their sentence structures when using certain verbs. For example, both these structures are commonly accepted: “I gave a gift to Jane” and “I gave Jane a gift.” But that’s not true for these examples: “I donated a book to the library” and “I donated the library a book.” The second sentence just sounds wrong. English contains many examples of dative alternation, and the precise reasons why some verbs alternate and others don’t remain a mystery.
Experimental linguists have tried to solve the mystery by teaching children made-up verbs (pilk, floose) and then building experiments to see how they use those verbs under different learning conditions. But large-scale child studies are hard to run, so Kim has begun a research study using language models as simulated child subjects. She doubts language models can ever fully replace human research subjects, but she wants to use the technology to test a wide range of hypotheses, in hopes of identifying ones promising enough to test in toddlers.
So far, she’s encouraged by her results, but she’s not convinced AI technology is about to revolutionize the field of linguistics.
“This is a new set of tools that you can use,” she says, “but I don’t think this is going to transform every aspect of existing language science.” That said, Kim is excited to use these new tools and is optimistic about their potential to push her research forward.