Massachusetts Institute of Technology
When he started his undergraduate education at Dartmouth College, Alex Kell considered studying economics or government. Instead, he veered into Chinese before going into psychology.
Economics and psychology attracted him because both study human behavior, but they were too abstract for Kell, now a Massachusetts Institute of Technology doctoral candidate. Neuroscience, in which he ultimately got his bachelor’s degree, gets down to the brain’s fundamental aspects.
“It seemed compelling and cool that you could mix a bunch of these fields together, like biology and chemistry and computer science and math, to study this interesting object that makes you you and makes me me,” says Kell, a Department of Energy Computational Science Graduate Fellowship (DOE CSGF) recipient.
With advisor Josh McDermott, Kell devises models of how we hear in the real world, seeking ones that can do what we can with sound. His work also explores machine learning and neural networks, key aspects of artificial intelligence.
Researchers often build computational models based on our knowledge about the auditory system. Kell says his approach is to “leapfrog that to try to get something that does what we do,” then see if it matches human behavior and predicts brain activity.
“The best models we have are from that class of algorithms that don’t incorporate many things we know about neuroscience but instead just do things that we do,” Kell says. Yet “there are tons of problems with the models that I’m building that will need to be improved upon.”
Kell’s models typically employ neural networks, interconnected layers of thousands of relatively simple units. Information is passed through several layers that transform the input, culminating in a label – for example, a written word identifying a spoken one. If the system works, the label is correct.
Training neural networks consumes computer power and time. Kell funnels millions of pieces of labeled data to the model so it learns patterns it can use to classify unknown input. “It’s going to be wrong initially, so you update all the parameters to be a little more right and do that a couple hundred thousand or a couple million times.” The process takes days but would be impossibly longer if not for advances in computing power.
Kell’s models have correctly labeled previously unheard sounds in complex environments at rates comparable to human subjects, but he also wants them to predict brain activity. He and McDermott use functional MRI, scanning subjects’ brains and tracking which regions respond as humans hear sounds.
The researchers “try to predict those brain responses from the internals of our model,” Kell adds. “That’s a crucial part of validating or exploring the extent to which this is a good model of the auditory system.”
In a paper recently accepted for publication, Kell, McDermott and their colleagues trained a computational neural network to map speech and music excerpts, embedded in background noise, to word labels and musical genres. In later testing it performed as well as humans in recognizing words and genres in noisy situations. They also used functional MRI to measure how the human auditory cortex – part of the brain heavily involved in hearing – responded to 165 natural sounds. When the team presented the same sounds to the neural network, its layers performed in ways that correspond to areas of human auditory cortices.
For his 2017 Lawrence Berkeley National Laboratory practicum, Kell explored methods to shrink these neural networks.
“A lot of these deep-learning systems are big, honking, overparameterized models” with possibly unnecessary specifications, Kell says. With his practicum advisor, Kristofer Bouchard, he researched ways to compress them into smaller networks by focusing on only the essential components.
Kell developed a code base to train neural networks, used a simple data set to teach thousands of them, and applied mathematical methods to compress the networks. The idea, he says, is to identify consistencies across networks by training dozens of them with some randomization. Things the network robustly learns across randomization may be essential components. Preliminary results showed that when compressing early layers, the approach enabled networks to maintain and possibly improve performance with far fewer parameters, but it failed to maintain similar performance when compressing later layers. Kell went on to use the code he developed in aspects of his thesis research.
Graduation isn’t until 2019, but Kell already has a plan: postdoctoral research focused on recording individual neuron activity in animals to gather fine-grained data he can model and analyze computationally.
Image caption: Maps of whether a portion of auditory cortex is better predicted by an intermediate or later layer of an artificial neural network. Cyan indicates that a higher layer explains more variance; magenta, a lower layer. Maps on the left show the mean across subject for the right hemisphere (RH) and the left hemisphere (LH). The figures on the right show three example of individual subjects. Below is a histogram/color bar of all values for each hemisphere. Black outlines show the location of three anatomically defined subdivisions of the primary auditory cortex. Credit: Alex Kell.