Estimating the contribution of non-genetic factors to gene expression using Gaussian process latent variable models

Thanks to the recent increase in the amount of genetic profiling data available and to the ability to characterize disease activity through gene expression, it is possible to understand more in detail the multitude of causal factors linked with each disease. This is a challenging task because the integration of different sources of biological data is not straightforward and because non-genetic factors (such as differences in the experimental setting or individual characteristics such as gender and ethnicity) are not always artificially controlled. Since these non-genetic factors may cause most of the variation in gene-expression reducing the accuracy of genetic studies, there’s a pressing need for models that take them explicitly into account. We present a model in which non-genetic factors are unobserved latent variables the gene expression levels can be described as linear functions of both these latent variables and Single Nucleotide Polymorphisms (SNPs). From a generative point of view, we can see the gene expression levels Y as Y = SV + XW +mu 1^T + epsilon Where S is the matrix containing the SNPs, X are the latent variables, V and W are mapping matrices, is a Gaussian distributed isotropic error model and mu allows the model to have non-zero mean. The model is inspired by the one proposed by Stegle et al. [1], but instead of optimizing parameters and marginalising latent variables (as in Probabilistic PCA), we marginalise the parameters and optimize the latent variables. For a particular choice of prior over the mapping matrices W and V the two approaches are equivalent. This kind of model is called dual Probabilistic PCA and it belongs to a wider class of models called Gaussian Process - Latent Variable Models. Indeed, dual PPCA is the special case where the output dimensions are assumed to be linear, independent and identically distributed. Each of these assumptions can be relaxed obtaining new probabilistic models. Many extensions of this model are possible, but even in its simplest form the eQTL study results are extremely promising in terms of number of significant associations found.

Estimating the contribution of non-genetic factors to gene expression using Gaussian process latent variable models

Nicolò Fusi

MORE VIDEOS FROM THE EVENT

MORE VIDEOS FROM THE SAME CATEGORIES