Large-scale SVD algorithms for Latent Semantic Indexing, Recommender Systems and Image Processing

More Info
expand_more

Abstract

Over the past three decades, the singular value decomposition has been increasingly used for various big data applications. As it allows for rank reduction of the input data matrix, it is not only able to compress the information contained, but can even reveal underlying patterns in the data through feature identification. This thesis explores algorithms for large-scale SVD calculation, and uses these to demonstrate how the SVD can be applied to a variety of fields, including information retrieval, recommender systems and image processing.

The algorithms discussed are Golub-Kahan-Lanczos bidiagonalization, randomized SVD and block power SVD. Each algorithm is implemented in Matlab and both error and time taken by each algorithm are compared. We find that the block power SVD is very effective, especially when only the truncated SVD is required. Due to its simplicity, speed and relatively small error for low-rank matrix approximation, it is an ideal method for the applications discussed in this thesis.

We show how the SVD can be used for information retrieval, through Latent Semantic Indexing. The method is tested on the Time collection and we find that the SVD removes much of the noise present in the data and solves the issues of synonymy and polysemy. Then, SVD-based algorithms for recommender systems are presented. We implement a basic SVD algorithm called Average Rating Filling, and a (biased) stochastic gradient descent algorithm, which was developed for the Netflix recommender-system prize. These are tested on the Movielens 100k dataset, resulting in the best performance by biased stochastic gradient descent. Finally, the SVD is used for image compression and we find that, while not very useful for face recognition, the SVD could provide a time- and space-efficient method for searching through an image database for similar images.

Files