Sonification of Auditory Models via Synthesis of Statistically Matched Stimuli

Jenelle Feather, Massachusetts Institute of Technology

A central goal of auditory neuroscience is to develop models of neural responses and perceptual judgments. Models are often evaluated by measuring their output to a set of sounds and correlating this predicted response with a measured neural response or by using the model output to predict perceptual judgments. An alternative approach is to synthesize sounds that produce particular values in a model's representation, typically those evoked by a given natural sound. The logic behind model-based sound synthesis is that sounds producing the same response in a model should evoke the same neural response (or percept) if the model replicates the representations underlying the neural response in question (or perception). We implemented a general-purpose optimization method to synthesize sounds from auditory models. The method extends previous sound texture synthesis methods that generate sounds matched on particular statistics (McDermott and Simoncelli, 2011). Unlike previous methods, we use automated gradient-based deep learning optimization tools, such that the same synthesis procedure can be applied to any differentiable model. The optimization procedure is implemented in TensorFlow, with a front end that extracts Hilbert envelopes from a cochlear filter bank applied to a sound waveform. Optimization occurs directly on the waveform, eliminating artifacts that could otherwise be created by imposing statistics on a spectrogram-like representation which must be inverted. We used the method to explore synthesis from moments of the waveform, cochlear filter outputs and modulation filter outputs, as well as correlations between pairs of cochlear or modulation filters. We also compared standard spectrotemporal modulation filters with learned filters from a task-optimized convolutional neural network. The approach facilitates stimulus generation for experiments featuring higher-order statistics and allows sonification of features in auditory models.

Abstract Author(s): Jenelle Feather, Josh McDermott