New Research


Learning Graphical Models and its Application to Genome Information Processing

Mathematical Science Division Professor Joe Suzuki

The problem of finding the function for which each gene posses information is an important issue in bio-informatics. In our laboratory, we address causal relations between gene expression (protein production) and SNP (single nucleotide) based on statistical processing.

We estimate mutual information between genes, between SNPs, and between a gene and a SNP, and express them as a forest. While each SNP takes a ternary value (++、--、+-), expression values are continuous and have three significant digits. It is easy to estimate mutual information between discrete variables, but if one variable is continuous, it is much harder to estimate without any assumption such as a normal distribution.

The breakthrough of this research is that we have made mutual information estimation possible without distinguishing each variable to be either discrete or continuous. The algorithm has been published as R package BNSL (Bayesian network structure learning).

Thus far, genome data has been obtained in terms of microarray. In fact the paper published in 2016 concerns microarray data. However, currently, those data are obtained via RNA sequence technique that has successfully make the throughput much higher. For the new framework, a different mathematical model is required and we consider a novel learning method without normalization that existing method requires.

Our research activity can be seen from the following link:
◎About this site
Go back to page top