Variational Model Selection for Sparse Gaussian Process Regression
Model selection for sparse Gaussian process (GP) models is an important problem that involves the selection of both the inducing/active variables and the kernel parameters. We describe an auxiliary variational method for sparse GP regression that jointly learns the inducing variables and kernel parameters by minimizing the Kullback-Leibler divergence between an approximate distribution and the true posterior over the latent function values. The variational distribution is parametrized using an unconstrained distribution over inducing variables and a conditional GP prior. This framework allows us to compute a lower bound of the true log marginal likelihood which can be reliably maximized over the inducing inputs and the kernel parameters. We will show how we can reformulate several of the most advanced sparse GP methods, such as the subset of data (SD), DTC, FITC and PITC method, based on the above framework.