Sentiment Analysis

a comparison of feature sets for social data and reviews

More Info


Consumers share their experiences or opinion about products or brands in various channels nowadays, for example on review websites or social media. Sentiment analysis is used to predict the sentiment of text from consumers about these products or brands in order to understand the tone of customers towards these products or brands. This thesis addresses sentiment analysis in the product domain on sentence level. In this thesis three data types are used which are collected by Unilever, review data which is text that contains the opinion of a customer towards a specific product. Social data, which can be tweets, Facebook messages, Instagram messages etc. and phone data which is a summary of a phone call of a customer about a specific product.

When conducting sentiment analysis one solution is to extract features from the data which can be given to a machine learning algorithm together with sentiment labels given by human annotators. The machine learning algorithm will generate a classifier which can predict a label for sentences.
In sentiment analysis literature it is often not clear why certain features are chosen or for which data type certain features will work well. In this research we compare the differences when using several feature sets for the different data types.

We propose three feature sets for review data and three feature sets for social data. We focus on two aspects, comparing the different feature sets and comparing the data types. In our results we do not find significant differences in performance between the feature sets. The results suggest there might be feature sets which can improve sentiment analysis specifically for the data type, but a general feature set with standard features can be comparable to that result.


Unknown license