An Analysis of the Microsoft Academic Graph

Herrmannova, Drahomira and Knoth, Petr (2016). An Analysis of the Microsoft Academic Graph. D-Lib Magazine, 22(9/10)



In this paper we analyse a new dataset of scholarly publications, the Microsoft Academic Graph (MAG). The MAG is a heterogeneous graph comprised of over 120 million publication entities and related authors, institutions, venues and fields of study. It is also the largest publicly available dataset of citation data. As such, it is an important resource for scholarly communications research. As the dataset is assembled using automatic methods, it is important to understand its strengths and limitations, especially whether there is any noise or bias in the data, before applying it to a particular task. This article studies the characteristics of the dataset and provides a correlation analysis with other publicly available research publication datasets to answer these questions. Our results show that the citation data and publication metadata correlate well with external datasets. The MAG also has very good coverage across different domains with a slight bias towards technical disciplines. On the other hand, there are certain limitations to completeness. Only 30 million papers out of 127 million have some citation data. While these papers have a good mean total citation count that is consistent with expectations, there is some level of noise when extreme values are considered. Other current limitations of MAG are the availability of affiliation information, as only 22 million papers have these data, and the normalisation of institution names.

Viewing alternatives


Public Attention

Altmetrics from Altmetric

Number of Citations

Citations from Dimensions

Item Actions