Menu

ProBic: identification of overlapping biclusters usinf Probabilistic Relational Models, applied to simulated gene expression data

calendar icon Sep 7, 2007 3742 views
split view icon
video icon
presentation icon
video with chapters icon
video thumbnail
Pause
Mute
speed icon
speed icon
0.25
0.5
0.75
1
1.25
1.5
1.75
2

Biclustering is an increasingly popular technique to identify regulatory modules that are linked to biological processes. A bicluster is defined as a subset of genes which have a similar expression profile for a subset of conditions in the context of gene expression data. We describe a novel method, called ProBic, to simultaneously identify a series of overlapping biclusters in gene expression data within the framework of Probabilistic Relational Models (PRMs) [1;2]. PRMs are a relational extension to Bayesian Networks and allow for the integration of relational data within a unified probabilistic framework. A PRM model describes a joint probability as in Bayesian networks but with additional constraints on the conditional probability functions. We propose a novel PRM based biclustering model, in which gene expression data can be considered as relational data. The classes are Gene, Condition and Expression. Both the classes Gene and Condition have a vector attribute Bicluster containing a series of bicluster-id’s. These vectors represent which biclusters exist for a gene or condition and are initially unknown. Condition has an extra attribute ID, which is a unique number for each condition. Expression has an attribute Level containing the expression value and two reference slots which point to the gene and condition for which the level was measured. Expression.Level is conditionally dependent on Gene.Bicluster, Condition.Bicluster and Condition.ID. The conditional dependency is modeled as a set of Gaussian distributions with conjugate priors. The ProBic model naturally deals with missing values (in fact, there are no ‘missing’ values in this model) and robust sets of biclusters are obtained due to explicit modeling of noise. The maximum likelihood solution is approximated using an Expectation-Maximization strategy. ProBic was applied to simulated gene expression data sets and all the biclusters were successfully identified. Various noise settings and different overlap models (average, sum, product) have been explored. Our results show that PRM models can be used to identify overlapping biclusters in an efficient and robust manner, naturally dealing with missing values and noise.

MORE VIDEOS FROM THE SAME CATEGORIES

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.