A data- and knowledge-driven framework for developing machine learning models to predict soccer match outcomes

Berrar, Daniel; Lopes, Philippe and Dubitzky, Werner (2024). A data- and knowledge-driven framework for developing machine learning models to predict soccer match outcomes. Machine Learning, 113 pp. 8165–8204.

DOI: https://doi.org/10.1007/s10994-024-06625-9

Abstract

The 2023 Soccer Prediction Challenge invited the machine learning community to develop innovative methods to predict the outcomes of 736 future soccer matches. The Challenge included two tasks. Task 1 was to forecast the exact match score , i.e., the number of goals scored by each team. Task 2 was to predict the match outcome as probability vector over the three possible result categories: victory of the home team, draw, and victory of the away team. Here, we present a new data- and knowledge-driven framework for building machine learning models from readily available data to predict soccer match outcomes. A key component of this framework is an innovative approach to modeling interdependent time series data of competing entities. Using this framework, we developed various predictive models based on k -nearest neighbors, artificial neural networks, naive Bayes, and ordinal forests, which we applied to the two tasks of the 2023 Soccer Prediction Challenge. Among all submissions to the Challenge, our machine learning models based on k -nearest neighbors and neural networks achieved top performances. Our main insights from the Challenge are that relatively simple learning algorithms perform remarkably well compared to more complex algorithms, and that the key to successful predictions lies in how well soccer domain knowledge can be incorporated in the modeling process.

Viewing alternatives

Download history

Metrics

Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions

Export

About