Non-parametric Estimation of Generators of Elliptical Distributions

More Info
expand_more

Abstract

In this thesis, we present simulation studies of a non-parametric estimator, proposed by Liebscher (2005). This estimator uses a well-known non-parametric estimator called kernel density estimator. Non-parametric estimation is used when the parametric distribution of a given dataset is unknown. This technique is then applied with an assumption that the distribution in question has a density, so that its density can be estimated. The density that we are interested in, belongs to a class of elliptical densities which has a similar contour shape as the Gaussian distribution. One of an example of such density is the density of multivariate normal distribution. Liebscher’s estimator uses elliptical density to circumvent the ’curse of dimensionality’. The ’curse of dimensionality’ often appears in non-parametric estimation. When a high dimensional dataset is applied to a nonparametric estimator, the convergence rate of the estimator becomes slow. This is what we refer to as the ’curse of dimensionality’. Liebscher’s estimator circumvents this ’curse’ by assuming that the multi-dimensional dataset is sampled from an elliptical distribution. The estimator then transforms the dataset into one-dimensional dataset, so that we can use the univariate kernel density estimation, instead of the multivariate ones. We use Liebscher’s estimator to estimate the generator of elliptical distribution. The generator is a positive real-valued function. Liebscher’s estimator depends on two parameters: the bandwidth parameter and the tuning parameter around the boundary. In this thesis, we investigate how these two parameter influence the performance of the estimator. We start with the case when the simulated dataset is sampled from the standard multivariate normal distribution. Then, we apply the estimator on a different elliptical distribution with different generator. As it turns out, finding the parameter such that the estimator gives a minimal error is a difficult task. This is because the area where the error is small, depends on the generator. We also observe that as we increase the dimension of the simulated dataset, the computational time of the estimator increases as well. Lastly, the estimator is applied to Wisconsin breast cancer dataset. The estimator is used to study the accuracy of a Bayes’ classifier is based on estimated posterior probabilites. From the study, it appears that the role of the tuning parameter is smaller in comparison the bandwidth parameter, in changing the accuracy of the classifier.