TV

53 records found

Authored

Manifold regularization is a commonly used technique in semi-supervised learning. It enforces the classification rule to be smooth with respect to the data-manifold. Here, we derive sample complexity bounds based on pseudo-dimension for models that add a convex data dependent reg ...

Learning performance can show non-monotonic behavior. That is, more data does not necessarily lead to better models, even on average. We propose three algorithms that take a supervised learning model and make it perform more monotone. We prove consistency and monotonicity with ...

Large text corpora used for creating word embeddings (vectors which represent word meanings) often contain stereotypical gender biases. As a result, such unwanted biases will typically also be present in word embeddings derived from such corpora and downstream applications in the ...
Plotting a learner’s average performance against the number of training samples results in a learning curve. Studying such curves on one or more data sets is a way to get to a better understanding of the generalization properties of this learner. The behavior of learning curves i ...
Active learning algorithms propose what data should be labeled given a pool of unlabeled data. Instead of selecting randomly what data to annotate, active learning strategies aim to select data so as to get a good predictive model with as little labeled samples as possible. Singl ...

Contributed

Rhyming words are one of the most important features in poems. They add rhythm to a poem, and poets use this literary device to portray emotion and meaning to their readers. Thus, detecting rhyming words will aid in adding emotions and enhancing readability when generating poems. ...

It sounds like Greek to me

Performance of phonetic representations for language identification

This paper compares the performance of two phonetic notations, IPA and ASJPcode, with the alphabetical notation for word-level language identification. Two machine learning models, a Multilayer Percerptron and a Logistic Regression model, are used to classify words using each o ...
This research provides an overview on how training Convolutional Neural Networks (CNNs) on imbalanced datasets affect the performance of the CNNs. Datasets could be imbalanced as a result of several reasons. There are for example naturally less samples of rare diseases. Since the ...
With an expectation of 8.3 trillion photos stored in 2021 [1], convolutional neural networks (CNN) are beginning to be preeminent in the field of image recognition. However, with this deep neural network (DNN) still being seen as a black box, it is hard to fully employ its capabi ...

Is Wikipedia succeeding in reducing gender bias?

Assessing the development of gender bias in word embeddings from Wikipedia

Large text corpora used for creating word embeddings (vectors which represent word meanings) often contain a stereotypical gender bias. This unwanted bias is then also present in the word embeddings and in downstream applications in the field of natural language processing. To pr ...
Word embeddings are useful for various applications, such as sentiment classification (Tang et al., 2014), word translation (Xing, Wang, Liu, & Lin, 2015) and résumé parsing (Nasser, Sreejith, & Irshad, 2018). Previous research has determined that word embeddings contain ...

Extracting location context from transcripts

A comparison of ELMo and TF-IDF

Using transcripts of the TV-series FRIENDS, this paper explores the problem of predicting the location in which a sentence was said. The research focuses on using feature extraction on the sentences, and training a logistic regression model on those features. Specifically looking ...
Text classification has a wide range of usage such as extracting the sentiment out of a product review, analyzing the topic of a document and spam detection. In this research, the text classification task is to predict from which TV-show a given line is. The skip-gram model, orig ...

In recent years many new text generation models have been developed while evaluation of text generation remains a considerable challenge.  Currently, the only metric that is able to fully capture the quality of a generated text is human evaluation, which is e ...

Authorship identification is often applied to large documents, but less so to short, everyday sentences. The ability of identifying who said a short line could provide help to chatbots or personal assistants. This research compares performance of TF-IDF and fastText when identify ...
Artificial Intelligence (AI) is increasingly affecting people’s lives. AI is even employed in fields where human lives depend on the AI’s decisions. However, these algorithms lack transparency, i.e. it is unclear how they determine the outcome. If, for instance, the AI’s purpose ...
StyleGAN is a neural network architecture that is able to generate photo-realistic images. The diversity of generated images are ensured by latent vectors. These latent vectors encodes important features of generated images. They provide us insight-full information about properti ...
Recently, deep generative models have been shown to achieve state-of-the-art performance on semi-supervised learning tasks. In particular, variational autoencoders have been adopted to use labeled data, which allowed the development of SSL models with the usage of deep neural net ...