Amassing a Small Army Against a Growing Enemy
CAS prof plays traffic cop on the information superhighway

There is a good chance you own a zombie. Your computer could have been infected by a website you visited or a link you clicked in an email from a trusted friend. Or maybe you didn’t do anything at all to compromise your computer, and still an attacker slipped past your firewalls and turned it into a virulent drone.
One thing, however, is nearly certain. If your computer has been breached, you will likely never know it.
Computer viruses, explains Mark Crovella, a College of Arts & Sciences professor of computer science, have changed drastically in the last five years. Where once attacks were obvious to victims—computers might slow down or run unwanted programs—users now fall prey to unseen intruders. Small programs, called bots, slip unseen through weak spots in PC protection; from there, they can take up secret residence on a computer and download complex instructions from a remote controller elsewhere on the internet. Poised to do the bidding of their masters, these zombie computers become part of “botnets”—networks of thousands, or even millions, of computers that can cause all sorts of trouble by scanning the internet to look for other vulnerable computers, sending out reams of spam email to ensnare other users, revealing keystrokes or online transactions to steal passwords and credit card numbers, or launching distributed denial-of-service (DDoS) attacks that overload and shut down other websites.
These are the types of threats that Crovella, director of lab operations in the computer science department, and his team of computer scientists and statisticians spend their days hunting down. Their goal is not to treat your ailing computer, or even to gird your existing security. Instead, they aim to identify unwanted internet traffic, allowing network providers to stop it from ever reaching your PC.
“For the most part, we tend to leave security to the virus protection programs we buy and install on our PCs,” Crovella says. These programs typically try to identify malicious software by looking for a signature—a sequence of code or something in the content—and then blocking programs with those signatures. For instance, he says, “there are programs that can tell a computer to block email with the word ‘Viagra’ in the title.” They can be effective, he says, but only for a little while. “An adversary just has to change one letter in its signature—to ‘Vi@gra,’ for example—and they’re in.”
Rather than attempt to define the properties of unwanted traffic, Crovella’s strategy is to paint a picture of what “normal” internet usage looks like. Using software he and his team designed, they capture and analyze anonymous traffic information at five-minute intervals as the data flows through thousands of routers around the world.
Unusual patterns—statistical anomalies in the amount or type of data being transferred—tip off Crovella and his team to potentially malicious activity. How these programs sneak into individual computers can change daily, even hourly. But their patterns of behavior, the ways they interact on the internet, are nearly always outside the norm.
DDoS attacks, for instance, generate abnormally large amounts of traffic. Content, too, can reveal criminals at work. “If you see a large variety of internet protocol, or IP, addresses—numbers that identify individual computers—coming from one source in a short period of time, that kind of activity is statistically anomalous,” Crovella explains. And “anything outside of statistically normal traffic patterns is potentially malicious.”
Other researchers and companies have tried similar techniques, but with only one router at a time. Using the unique multivariate statistical approach developed at BU, says Crovella, “suddenly, activity outside the norm stands out in a way it never could if you were looking for it at each source.”
His technique—based on a method called principal component analysis and licensed to Guavus, Inc., a venture-backed binational company led by one of Crovella’s former PhD students, Anukool Lakhina (CAS’01, GRS’01,’07)—is now being used by GÉANT, Europe’s main multigigabit computer network, for research and academic purposes.
Crovella continues to refine the technique at BU. Executing this type of analysis requires collecting an immense amount of data, and while computers amass and evaluate it, he and his team must validate their results themselves before submitting their research for publication. This manual examination of multiple terabytes (a terabyte is one trillion bytes) requires not only a great deal of time and patience, but also expertise in both computer science and statistics. Crovella typically works with two to three students at a time and one to three other faculty investigators. It is a small army against a growing enemy: every day, according to computer protection provider Symantec MessageLabs, approximately 151 billion unsolicited messages are distributed by compromised computers.
“A year or so ago, I discovered my own PC was infected with a botnet,” Crovella says, smiling and shaking his head. “The IT folks discovered it. I never knew it was there.”
This story originally appeared in the 2010 issue of Research magazine.
Comments & Discussion
Boston University moderates comments to facilitate an informed, substantive, civil conversation. Abusive, profane, self-promotional, misleading, incoherent or off-topic comments will be rejected. Moderators are staffed during regular business hours (EST) and can only accept comments written in English. Statistics or facts must include a citation or a link to the citation.