Artificial neural networks that “listen” to infant vocalizations

Anne Warlaumont, University of Memphis

Photo of Anne Warlaumont

Research on early infant vocal development often relies on trained humans to classify infant sounds into categories of interest. Advantages of this method are that human perceptual abilities are very sophisticated and that humans are likely to pay attention to the acoustic characteristics that are the most centrally involved in the processes of change and development actually taking place within the infant. After all, infants develop speech in the context of communicative interaction with moms, dads, and other humans. However, heavy reliance on human judgments has some major disadvantages. For one, it is expensive, time-consuming, and difficult, especially when large data sets are involved. Second, we have only a limited understanding of the acoustic dimensions that human listeners attend to, and of how their perceptual representations are structured.

We have been developing artificial neural network simulations to address these issues. The networks are provided with relatively unprocessed samples of infant vocal productions, in the form of spectrograms. We first apply a self-organizing map, a type of neural network that uses an unsupervised, biologically-motivated algorithm to form a representational space consisting of topographically organized receptive fields. We then apply a perceptron, a type of supervised neural network, to use the self-organizing map’s responses to perform a category judgment task typically performed by humans.

To evaluate the networks, we first assess their categorization performance quantitatively. Second, we inspect the weights of connections between nodes, visualizing the receptive fields learned by the self-organizing map and observing how they are weighted in categorization. Third, we compare the topographic mapping of representations learned by the self-organizing map to mappings achieved using principal components analysis and to the perceptual spaces exhibited by adult human listeners. Our results suggest that neural networks may be useful for studying infant vocalizations and may serve as reasonable models of human perception.

Abstract Author(s): Anne S. Warlaumont<br /> D. Kimbrough Oller<br />Eugene H. Buder<br />Rick Dale<br />Robert Kozma