Bayesian Interpretations of RKHS Embedding Methods

We give a simple interpretation of mean embeddings as expectations under a Gaussian process prior. Methods such as kernel two-sample tests, the Hilbert-Schmidt Independence Criterion, and kernel herding are all based on distances between mean embeddings, also known as the Maximum Mean Discrepancy (MMD). This Bayesian interpretation allows a derivation of optimal herding weights, principled methods of kernel learning, and sheds light on the assumptions necessary for MMD-based methods to work in practice. In the other direction, the MMD interpretation gives tight, closed-form bounds on the error of Bayesian estimators.