The Powers of Machine Learning in Amazon Alexa and Music Generation

Associate Professor (ECE, CS, CDS, SE) Brian Kulis

Every day millions of people look something up online. It’s become a habit, a part of our lives that we take for granted– and we can thank machine learning for that. Machine learning is a type of artificial intelligence that works with computer algorithms. It uses data and statistics to create predictions and it tries to imitate the way humans learn.

CISE Faculty Affiliate Brian Kulis works in the area of machine learning. Kulis is Associate Professor (ECE, CS, CDS, SE) at Boston University and is also an Amazon Alexa Scholar. While originally focused on unsupervised machine learning, Kulis’ recent research focuses on various core machine learning problems, as well as applied machine learning such as for music generation and Amazon Alexa applications.

In applying machine learning to data-related applications, Kulis typically starts by designing an algorithm and then tweaking that algorithm to apply to different applications. However, Kulis says that it is also possible to start with the application and from there, build the algorithm. This is what some of his students have done in his build lab. Years ago, Rachel Manzelli and Vijay Thakkar, undergraduate students, took on the project of generating music. From there, Master’s students, Yousif Khaireddin and Krishna Palle, and Ph.D. student Sadie Allen have also worked on the project. 

Kulis said they’ve been attacking the problem from an applications (techniques to building models) and a theoretical perspective (algorithms behind models). 

Kulis’ work with generating music attempted to combine a model generating raw audio and a model generating notes. Generating raw audio can be more difficult since it’s not reproducing notes, but it has its advantages because it can produce different sounds. By combining the two models, he and his team were able to construct a symbolic version of a song and then add the component of raw audio to produce music that sounds more natural. 

Kulis said he trained the models on music which enabled them to start generating music. 

“You give it training data, and in this case training data for the raw audio models, which are recordings of music,” Kulis said. “Then what the model is trying to do is it’s going to build a model that says, ‘Okay, if I’ve listened to a second of audio say, from this from a track, I’m going to try to predict how the next little piece of audio looks.’” 

Generating music automatically is beneficial for apps like Spotify, which might not want to spend a lot of money on hiring people to create background music. It’s also meant to help artists make music.

“We actually aren’t just interested in taking artists out of the loop,” Kulis said. “We want to help artists in the creative process as well. So there’s some examples of musicians using AI tools to help them generate melodies.”

Kulis is also active in industry research. As an Amazon Scholar, he is working to improve the accuracy of when Alexa responds to its name. This is triggered by the wakeword, which is “Alexa”. 

In order to eliminate false wakes, which can occur when the device thinks its name is being said, the device has a classifier on the hardware that is constantly trying to determine whether or not Alexa should be responding. Alexa responds when it is confirmed in the Cloud that the person said the word “Alexa”. Essentially, the audio gets translated into a spectrogram, which allows the sound to be captured visually. If the spectrogram matches the spectrogram of the sound produced when Alexa is spoken, then the device responds. 

Once the device recognizes that it has been woken up, it will record what’s being said so it can send information to the Cloud, which might be thousands of miles away. This all happens seamlessly in milliseconds. Kulis and his team have worked on eliminating false wakes by setting up a secondary network in the Cloud that verifies the wakeword. Kulis shares findings about this work in a paper titled Building a Robust Word-Level Verification Network.

As Associate Professor at Boston University, Kulis has created two courses, “Intro to Machine Learning” available to undergraduates, and “Deep Learning”, taught at the graduate level. He said many students have gone into industry after taking these courses and that’s how he hopes to influence others through his work with machine learning. 

Kulis wrote in an email that machine learning has benefited society in terms of improving patient outcomes by automating data analysis and driving research progress forward more quickly. He also said it has impacted daily lives.  

“Every time we use Google or do any kind of online search, we are using machine learning,” Kulis said. “Machine learning for audio has changed how we interact with computers, and self-driving cars, someday, will routinely save thousands or millions of lives.” 

By Zoe Tseng, CISE Staff Writer