# New Research

2010/09/01

# Data science with multivariate statistical analysis: Part 2

## \"Analysis of incomplete data\"

### Yutaka Kano (Professor of Statistics, Division of Mathematical Science）

This is our second appearance on this web page, the first time being in February of 2007. On this occasion, we will show research on the analysis of data with missing values, which can cause serious problems in statistical analysis.

Typically, empirical studies are conducted to confirm theoretical developments with experiments and/or observations. If some data values are not obtained in an experiment or observation, it is said that data are missing and the data set is incomplete. One can hardly obtain proper results based on incomplete data sets. What do incomplete data cause specifically? How should one handle incomplete data?

Suppose that one makes an experiment to see the effects of certain factors on a criterion or variable of interest. One could not identify some effects if one does not obtain data at some combinations of levels of factors.

For example, let us study the relation between university entrance examination scores and grade points (average) at the university. This study is important because the entrance examination would make no sense if no significant relation between them could be found. However, data often show very weak relations between them. Why is this? It is because grade points of students who had not been admitted do not exist. Thus, the data sets are typically incomplete! It is known that the correlational analysis for only admitted university students (i.e. complete case analysis) leads to the unduly smaller size of the correlation (see the figure). Great human wisdom has successfully overcome the difficulty through developing an ingenious method of analyzing such incomplete data sets, where the grade points of unadmitted students are estimated.

Developing new effective methods of analyzing a wide variety of incomplete data sets is among the most important research in our laboratory. Education and research are made primarily through small-group instruction, which is quite effective to properly understand mathematics and to acquire mathematical skills. Therefore, we are currently conducting many small-group seminars within our laboratory, with each of our students taking part in several of them.