Dependence Measures in Citation Analysis

Bachasingh, A.D.S.

Dependence Measures in Citation Analysis

The application of parametric copulas to capture the dependence structure between the publications of a reseacher and the citations of those publications.

Bachelor thesis (2018)

Authors

A.D.S. Bachasingh Electrical Engineering, Mathematics and Computer Science

Contributors

G.F. Nane (mentor)

Dion Gijswijt (graduation committee member)

E.M. van Elderen (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:8e42c860-a072-4709-b7d7-69ff02e79e00

More Info

expand_more

Published Date

20-12-2018

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

In this thesis we try to capture the dependence structure of the publications of a scholar and the citations of those publications via copulas. To do so, we will use a sample of Quebec re- searchers for who their publication amount as well as their citation amounts are known. We are provided with multiple variables concerning citation. We study the dependence structure be- tween these variables, with the aim of fitting copulas to this structure, by calculating correlation scores and visualising the structure. Copulas are functions that ”join together” one-dimensional distribution functions with a dependence structure, in order to represent joint distributions. The correlation scores are calculated across various ranges of the variables to provide us with a deeper understanding of the dependence structure between the variables.
Using Sklar’s theorem and some helpful functions in various packages in the software program R, parametric copulas fit the dependence structures of the various pairs of variables. Based on a Goodness-of-fit test, certain parametric copula models are rejected at a 5% significance level. Unsurprisingly, there are also dependence structures that can be well captured with a parametric copula.
Parametric copula families are not only used for fitting the data, but also for prediction. Since a good fitting model does not necessarily imply a good predictive model, we have also performed a validation analysis. The parametric copula models that are not rejected by the test at a 5% significance level are validated via k-fold cross validation. Part of the data have been used to fit the model and the remaining has been validated using a k-fold cross validation. It turns out that the best fitting copula model does not always perform well in term of prediction. That is, these copulas do not always perform best during the cross-validation.

Files

Verslag_BEP.pdf

(pdf | 10.4 Mb)

Unknown license