New Research


2014/8/1

Algorithms for measuring randomness and the theory of mathematical statistics.

Statistical Analysis Group: Prof. Hidetoshi Shimodaira, Assoc. Prof. Fuyuhiko Tanaka, Assist. Prof.

Are rabbits more closely related to humans or mice? Judging by appearance, most people may think it is the latter of the two. Modern biology utilizes genetic information stored in DNA, instead of the appearance, for the classification of species. Branching from the common ancestor, the process of evolution is thought of as the accumulation of random events, which is then described by probabilistic models. An estimated phylogenetic tree of these three species was published as a new discovery in a distinguished journal. According to this research, rabbits turned out to be closer to humans than to mice.

However, we will never be able to make an inference from data with perfect confidence. Randomness inherent in data makes it very difficult to predict the future or to conclude a general principle only from data. In fact, the estimated phylogenetic tree in which rabbits are closer to humans is now considered to be wrong. It is therefore very important to measure the randomness in data and compute the confidence value of estimated results. In the field of statistics, this issue has been discussed intensively since the late 19th century, leading to the idea of Bayesian posterior probability and hypothesis testing. The modern computer simulation methods of confidence value for practical use became popular in 1980s. Confidence values computed by these conventional methods are biased slightly so that they tend to make false discoveries.

One of the research projects of Statistical analysis group headed by Professor Shimodaira is to develop very accurate confidence values. Thinking of the geometry concerning how much probability models are similar to each other in terms of mathematical statistics theory, the curvature representing space distortion causes the bias of conventional confidence values. We have mathematically proved that the bias is removed if the data size of simulation is formally set to a negative value. The negative data size is mysterious, but our new method recalculates the confidence values of phylogenetic trees suggesting that rabbits are closer to mice. By now, this new method for computing confidence values has been used in many researches of life sciences.


Website of the research group of Professor Shimodaira
◎About this site
Go back to page top