Analyse Large language models and evaluate their usage inside CORE-QA. Which model performs better? Which model is the best compromise in terms of cost, accuracy, and performance?

Obiogbolu, Kaobimdi Ian (2024). Analyse Large language models and evaluate their usage inside CORE-QA. Which model performs better? Which model is the best compromise in terms of cost, accuracy, and performance? Knowledge Media Institute, The Open University, Milton Keynes, UK.

URL: https://kmi.open.ac.uk/scholarship/

Abstract

Individual researchers spend hours in libraries, reading books, and looking through various articles to find the perfect information. It can take a while, and you might still end up with confusing answers. Technologies such as CORE, which is seen as a research dictionary containing millions of research articles in different kinds of fields, like science, business, and arts, serve as a one-stop solution for finding verified answers to research questions. However, searching this vast knowledge source to find the right answer can be challenging. By combining CORE with a suitable Large Language Model (LLM), we can provide well-detailed information and credible references to find the best answers.
This report explores how CORE and different LLMs handle research questions in various disciplines and compares LLMs to how individuals do research. The objective is to evaluate the response from the existing models in terms of comprehensiveness, trustworthiness and usefulness and to analyse if the LLMs understand the questions being asked. Based on the comparison results, we identify the best and the least performing LLMs for question answering for CORE.

Plain Language Summary

This report examines how CORE, when combined with Large Language Models (LLMs), can provide detailed and credible answers to research questions. It compares how well different LLMs and individual researchers handle research questions in various disciplines. The goal is to evaluate the models based on how comprehensive, trustworthy, and useful their responses are, and to see if they understand the questions being asked. The report identifies the best and worst performing LLMs for answering questions using CORE.

Viewing alternatives

Download history

Item Actions

Export

About