Copy the page URI to the clipboard
Mitra, Robin; McGough, Sarah; Chakraborti, Tapabrata; Holmes, Chris; Copping, Ryan; Hagenbuch, Niels; Biedermann, Stefanie; Noonan, Jack; Lehmann, Brieuc; Shenvi, Aditi; Doan, Xuan Vinh; Leslie, David; Bianconi, Ginestra; Sanchez-Garcia, Ruben; Davies, Alisha; Mackintosh, Maxine; Andrinopoulou, Eleni-Rosalina; Basiri, Anahid; Harbron, Chris and MacArthur, Ben
(2023).
DOI: https://doi.org/10.1038/s42256-022-00596-z
Abstract
Missing data are an unavoidable complication in many machine learning tasks. When data are ‘missing at random’ there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or structure, either explicitly or implicitly. Such ‘structured missingness’ raises a range of challenges that have not yet been systematically addressed, and presents a fundamental hindrance to machine learning at scale. Here we outline the current literature and propose a set of grand challenges in learning from data with structured missingness.