Dependence Measures in Citation Analysis
The application of parametric copulas to capture the dependence structure between the publications of a reseacher and the citations of those publications.
More Info
expand_more
Abstract
In this thesis we try to capture the dependence structure of the publications of a scholar and the citations of those publications via copulas. To do so, we will use a sample of Quebec re- searchers for who their publication amount as well as their citation amounts are known. We are provided with multiple variables concerning citation. We study the dependence structure be- tween these variables, with the aim of fitting copulas to this structure, by calculating correlation scores and visualising the structure. Copulas are functions that ”join together” one-dimensional distribution functions with a dependence structure, in order to represent joint distributions. The correlation scores are calculated across various ranges of the variables to provide us with a deeper understanding of the dependence structure between the variables.
Using Sklar’s theorem and some helpful functions in various packages in the software program R, parametric copulas fit the dependence structures of the various pairs of variables. Based on a Goodness-of-fit test, certain parametric copula models are rejected at a 5% significance level. Unsurprisingly, there are also dependence structures that can be well captured with a parametric copula.
Parametric copula families are not only used for fitting the data, but also for prediction. Since a good fitting model does not necessarily imply a good predictive model, we have also performed a validation analysis. The parametric copula models that are not rejected by the test at a 5% significance level are validated via k-fold cross validation. Part of the data have been used to fit the model and the remaining has been validated using a k-fold cross validation. It turns out that the best fitting copula model does not always perform well in term of prediction. That is, these copulas do not always perform best during the cross-validation.