Large-scale SVD algorithms for Latent Semantic Indexing, Recommender Systems and Image Processing

Bachelor thesis (2018)

Authors

Y.M. van der Vlugt Electrical Engineering, Mathematics and Computer Science

Contributors

M.B. van Gijzen (mentor)

B. van den Dries (graduation committee member)

J. Cai (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

Recommender System Singular value decomposition Latent semantic indexing

To reference this document use:

http://resolver.tudelft.nl/uuid:fd6490b5-3f75-48d0-a7bb-115a41babc34

More Info

expand_more

Published Date

03-07-2018

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Over the past three decades, the singular value decomposition has been increasingly used for various big data applications. As it allows for rank reduction of the input data matrix, it is not only able to compress the information contained, but can even reveal underlying patterns in the data through feature identification. This thesis explores algorithms for large-scale SVD calculation, and uses these to demonstrate how the SVD can be applied to a variety of fields, including information retrieval, recommender systems and image processing.

The algorithms discussed are Golub-Kahan-Lanczos bidiagonalization, randomized SVD and block power SVD. Each algorithm is implemented in Matlab and both error and time taken by each algorithm are compared. We find that the block power SVD is very effective, especially when only the truncated SVD is required. Due to its simplicity, speed and relatively small error for low-rank matrix approximation, it is an ideal method for the applications discussed in this thesis.

We show how the SVD can be used for information retrieval, through Latent Semantic Indexing. The method is tested on the Time collection and we find that the SVD removes much of the noise present in the data and solves the issues of synonymy and polysemy. Then, SVD-based algorithms for recommender systems are presented. We implement a basic SVD algorithm called Average Rating Filling, and a (biased) stochastic gradient descent algorithm, which was developed for the Netflix recommender-system prize. These are tested on the Movielens 100k dataset, resulting in the best performance by biased stochastic gradient descent. Finally, the SVD is used for image compression and we find that, while not very useful for face recognition, the SVD could provide a time- and space-efficient method for searching through an image database for similar images.

Files

BEP_verslag.pdf

(pdf | 3.07 Mb)