Materials Informatics

With state-of-the-art technologies in data science, we aim to discover novel functional materials. Target materials include drugs, dyes, solvents, polymers, polymeric composites and nanostructured materials. With the comprehensive technologies of machine learning, such as Bayesian modeling, kernel methods, natural language processing, sparse learning and optimization theory, we create the fundamental methodology and research infrastructures of Materials Informatics.

Materials Research by Information Integration Initiative (MI2I)

Materials Informatics is an emerging cross-disciplinary field aimed at combining materials science and data science. The Materials Genome Initiative (MGI), announced by President Obama in 2011, was designed to create infrastructures that accelerate the pace of discovery and deployment of new materials for innovative products twice as fast. The MGI white paper has stated Materials Informatics as the key to achieving dramatic reduction of time and costs for the research developments in materials science. In Japan, the Japan Science and Technology Agency (JST) launched the Materials Research by Information Integration Initiative (MI2I) at National Institute of Materials Science (NIMS) on July 2015. The Institute of Statistical Mathematics has been designated to be a recommitment site of MI2I as the central institute of data science in Japan.

Role of Data Science in Material Discovery and Development

The design space of materials development is considerably high-dimensional. For instance, the chemical space of organic compounds consists of 1060 potential candidates. The challenge is to discover unidentified novel materials from the huge landscape that exhibit desirable material properties. In the traditional procedure, computational chemistry methods, such as the first principle calculation, have been the central analytic tool. Scientists hypothesize material structures based on experience and intuition, and properties of the designed materials are assessed computationally and experimentally. The data-driven approach has been paid much attention as a promising alternative that can promote enormous savings on time and costs in the labor intensive and time-consuming trial-and-error procedure.

Bayesian approach to data-driven materials discovery

The aim of our study is to create a novel material design method by the integration of machine learning and quantum chemistry calculation. The method begins by obtaining a set of machine learning models to forwardly predict properties of input material structures for multiple design objectives. These models are inverted to the backward model through the Bayes’ law. Then we have a posterior probability distribution which is conditioned by desired properties. Exploring high probability regions of the posterior, it is expected to identify new materials possessing the desired target properties. Under industry-academia partnerships, we are putting into practice this Bayesian material design method.


Ikebata, H., Hongo, K., Isomura, T., Maezono, R., Yoshida, R. (2017) Bayesian molecular design with a chemical language model, Journal of Computer-Aided Molecular Design, 31(4):379-391.

R package iqspr version 2.4

XenonPy, Python Library on Representation & Learning for Materials Data