The Deterministic Information Bottleneck
Compression fundamentally involves a decision about what is relevant and what is not. The information bottleneck (IB) method by Tishby, Pereira and Bialek formalized this notion as an information-theoretic optimization problem and proposed an optimal tradeoff between discarding as many bits as possible and selectively keeping those that are most important. Here we introduce an alternative formulation, the deterministic information bottleneck (DIB), which we argue better captures this notion of compression. As suggested by its name, the solution to the DIB problem is a deterministic encoder, as opposed to the stochastic encoder that is optimal under the IB. After deriving the method, we compare the IB and DIB on synthetic data, showing that the IB and the DIB perform similarly in terms of the IB cost function, but that the DIB significantly outperforms the IB in terms of the DIB cost function. Moreover, the DIB offered speedup of one to two orders of magnitude over the IB in our experiments. Our derivation of the DIB also suggests a method for continuously interpolating between the soft clustering of the IB and the hard clustering of the DIB. We also discuss and speculate on the relevance of the information bottleneck to sensory processing in the brain.