MSMDG meets every other Thursday at 2pm, alternating between the Molecular Sciences Building and Metallurgy & Materials.
During the summer, meetings are non-structured and are an opportunity for ad hoc discussion and coffee. Meeting information is circulated via the mailing list.
The next meeting will be a talk by invited speaker, Matthew Evans, entitled “Decentralized materials research data management, curation and dissemination for accelerated discovery“. We will meet at 2 pm on Thursday the 12th at Met&Mat GA03.
ABSTRACT: The primary barrier to widespread adoption of AI-accelerated materials science is the availability and quality of data. Researchers lack frictionless tooling and have limited incentive to record their data in such a way that is immediately amenable for machine learning, whether by them or by others. This talk introduces two data projects in the materials space that aim to lower the barrier to data access and curation by both humans and machines: the OPTIMADE federation of materials databases, and the open-source datalab materials data management platform.
OPTIMADE consists of an international consortium of databases that have designed, over many years, a common application programming interface (API) format, which now allows for 30+ databases across 20+ providers to be seamlessly queried. Such federated data unification enables decentralized data-driven workflows in materials informatics and beyond, from materials selection up to materials discovery. OPTIMADE is supported by several community-oriented tools that allow others to easily contribute their data to this growing ecosystem. This talk will introduce the OPTIMADE ecosystem, discuss the process of consensus-forming amongst provideres, and outline how OPTIMADE could be extended to other domains.
The second project primarily concerns experimental data; datalab is a open-source data management platform that can be customized and adopted by materials research groups to allow for straightforward provenance tracking of samples, devices and raw data. It integrates with the broad open-source community of file format parsers (from the datatractor initiative and other popular packages) to allow for data normalization and simple analysis in the browser for many characterisation techniques (XRD, NMR, Raman, electrochemistry, etc). This platform provides the traditional benefits of having a digital system of record (e.g., an electronic lab notebook), whilst also enabling programmatic re-use of data across a research group via its API, with the aim to allow end user programming. By providing labs with control over their data platform, they can develop their own AI-driven developments, as well as selectively sharing and collaborating with others on shared workflows and samples. This talk will summarize the ongoing developments of datalab, including the integration of AI-based agents, and motivate future use cases of a federation of such datalab deployments.
Matthew grew up in Norfolk and studied for an MPhys in Theoretical Physics at the University of Manchester. He moved to the Cavendish Laboratory, University of Cambridge, for an MPhil and PhD in computational materials science with Prof Andrew Morris, where he worked on crystal structure prediction for battery electrode materials and associated data and software projects. In 2020, he joined the group of Prof Gian-Marco Rignanese at UCLouvain to continue work on the OPTIMADE initiative for crystal structure database API standardisation, and ML-accelerated materials discovery. Since 2020, he has been a visitor in the Department of Chemistry at the University of Cambridge, where he leads the development of datalab, a self-hosted data management platform for academic and industrial materials chemistry labs. In 2022, he was awarded a BEWARE Research Fellowship to continue this work, jointly with the materials informatics company Matgenix.