Towards an Automatic Approach for Uncovering Ethnic Bias in Online Learning Texts

Albuquerque, Josmario (2023). Towards an Automatic Approach for Uncovering Ethnic Bias in Online Learning Texts. PhD thesis The Open University.



Recent findings have indicated persistent ethnic biases in online learning platforms. For instance, researchers have suggested that individuals from ethnic minority groups are more likely to be marginalised and have their academic performance diminished. Although compelling evidence indicates the impact of ethnic biases on students, uncovering such biases in online learning platforms is challenging. First, the large amounts of online educational data make it impractical to uncover such biases by hand. Second, bias can be subjective which means that what is considered biased in certain contexts might not be considered biased in others. In addition, individuals from certain cultures and groups might perceive bias differently.

Accordingly, this PhD thesis aims to answer the following overreaching research question: How might ethnic bias in text-based online learning materials be automatically identified while considering the subjective nature of bias? A design-based research methodology is adopted to answer that question, where several cycles of design, implementation, and reflection are performed. This iterative process is organised into three studies.

In Study 1, literature for existing approaches aiming at uncovering bias in texts is reviewed. Promising computer-based approaches to identify bias in texts are selected based on the literature review and implemented in the context of an online learning platform. Drawn from the limitations of those approaches, Study 2 looked at how ethnic biases manifest in learning texts by asking 193 higher education students to label ethnic bias in Open Educational Resources (OER), and how contextual elements like the OER’s title and discipline help them identify such biases. In Study 3, well-known learning analytics models are applied to the labelled dataset from Study 2, and their performance is checked against the identification of perceived ethnic bias in textual OERs.

Key findings from Study 1 reveal bias in online learning texts has received limited attention from the research community, in particular, selected studies delineate bias based on its theoretical aspects rather than how students perceive it. Study 2 suggests that students from ethnic minority populations perceive ethnic bias differently than students identified as White, as a range of sentences from selected online learning texts is labelled as biased by one group but not by the other. Study 3’s key findings indicate statistically significant correlations between perceived ethnic bias and socio-linguistic features like socialisation and aggressiveness. Those features are used for training different classifiers against the identification of ethnic bias. The results suggest SVMs and Random Forest models reliable in identifying ethnic bias in online learning texts. Combining logistic regression, SVM, Naive Bayes, K-nearest Neighbors, and XGBoost could also provide balanced performance. Naive Bayes may not be as effective as SVMs and Random Forests in identifying bias but had the best precision when dealing with unknown data.

Overall, this PhD thesis has made a substantial contribution to understanding and identifying potential ethnic bias in online learning texts, which is crucial for reducing inequities and promoting inclusiveness in online learning platforms. Furthermore, this thesis has contributed to knowledge in Learning Analytics by providing evidence of how ethnic bias manifests in online learning texts and how certain computational models might support the automatic identification of such biases in large datasets.

Viewing alternatives

Download history


Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions