Relational dependency network


Relational dependency networks are graphical models which extend dependency networks to account for relational data. Relational data is data organized into one or more tables, which are cross-related through common fields. A relational database is the canonical example of a system that serves to maintain relational data. A relational dependency network can be used to characterize the knowledge contained in a database.

Introduction

Relational Dependency Networks aims to get the joint probability distribution over the variables of a dataset represented in the relational domain. They are based on Dependency Networks and extend them to the relational setting. RDNs have efficient learning methods where a RDN can learn the parameters independently, that is, the conditional probability distributions can be estimated separately. Since there may be some inconsistencies due to the independently learning method, RDNs use Gibbs sampling to recover joint distribution, like DNs.
Unlike Dependency Networks, RDNs need three graphs to fully represent them.
In summary, the data graph guides how the model graph will be rolled out to generate the inference graph.

RDN Learning

The learning method for a RDN is similar to that method used by DNs, that is, all conditional probability distributions can be learned for each of the variables independently. However, only conditional relational learners can be used during parameters estimation process for RDNs. Therefore, that learners used by DNs, like decision trees or logistic regression, don't work for RDNs. Neville, J., & Jensen, D. present some experiments results comparing RDNs when learning with Relational Bayesian Classifiers and RDNs when learning with Relational Probability Trees. Natarajan et al. use a serie of regression models to represent conditional distributions.
This learning method makes the RDN a model with an efficient learning time. However, this method also makes RDNs susceptible to some structural or numerical inconsistences. If the conditional probability distribution estimation method uses feature selection, it's possible that a given variable finds a dependency among it and another variable while the latter doesn't find this dependency. In this case, the RDN is structurally inconsistent. In addition, if the joint distribution doesn't sum one due to approximations caused by the independent learning, then we say that there is a numerical inconsistence. Fortunately, such inconsistences can be bypassed during the inference step, as we will see soon in the RDN inference section.

RDN Inference

RDN inference begins with the creation of inference graph through a process called roll out. In this process, the model graph is rolled out over the data graph to form the inference graph. Next, Gibbs sampling technique can be used to recover conditional probability distribution.

Applications

RDNs have been applied in many real-world domains. The main advantages of RDNs are their ability to use relationships informations to improve the model's performance. Diagnosis, forecasting, automated vision, sensor fusion and manufacturing control are some examples of problems where RDNs were applied.

Implementations

Some suggestions of RDN implementations: