ML

39 records found

With the rise of zero-shot synthetic image generation models, such as Stability.ai's Stable Diffusion, OpenAI's DALLE or Google's Imagen, the need for powerful tools to detect synthetic generated images has never been higher. In this thesis we contribute to this goal by consideri ...

Machine learning algorithms (learners) are typically expected to produce monotone learning curves, meaning that their performance improves as the size of the training dataset increases. However, it is important to note that this behavior is not universally observed. Recently ...
A learning curve displays the measure of accuracy/error on test data of a machine learning algorithm trained on different amounts of training data. They can be modeled by parametric curve models that help predict accuracy improvement through curve extrapolation methods. However, ...
Learning curves have been used extensively to analyse learners' behaviour and practical tasks such as model selection, speeding up training and tuning models. Nonetheless, we still have a relatively limited understanding of the behaviour of learning curves themselves, in particul ...
Extrapolation of the learning curve provides an estimation of how much data is needed to achieve the desired performance. It can be beneficial when gathering data is complex, or computation resource is limited. One of the essential processes of learning curve extrapolation is cur ...
The learning curve illustrates how the generalization performance of the learner evolves with more training data. It can predict the amount of data needed for decent accuracy and the highest achievable accuracy. However, the behavior of learning curves is not well understood. Man ...
Autoencoders seek to encode their input into a bottleneck of latent neurons, and then decode it to reconstruct the input. However, if the input data has an intrinsic dimension (ID) smaller than the number of latent neurons in the bottleneck, this encoding becomes redundant.
...
Although there are many promising applications of a learning curve in machine learning, such as model selection, we still know very little about what factors influence their behaviours. The aim is to study the impact of the inherent characteristics of the datasets on the learning ...
Supervised machine learning is a growing assistive framework for professional decision-making. Yet bias that causes unfair discrimination has already been presented in the datasets. This research proposes a method to reduce model unfairness during the machine learning training pr ...
Does a convolutional neural network (CNN) always have to be deep to learn a task? This is an important question as deeper networks are generally harder to train. We trained shallow and deep CNNs and evaluated their performance on simple regression tasks, such as computing the mea ...
This research provides an overview on how training Convolutional Neural Networks (CNNs) on imbalanced datasets affect the performance of the CNNs. Datasets could be imbalanced as a result of several reasons. There are for example naturally less samples of rare diseases. Since the ...
With an expectation of 8.3 trillion photos stored in 2021 [1], convolutional neural networks (CNN) are beginning to be preeminent in the field of image recognition. However, with this deep neural network (DNN) still being seen as a black box, it is hard to fully employ its capabi ...
Yes, convolutional neural networks are domain-invariant, albeit to some limited extent. We explored the performance impact of domain shift for convolutional neural networks. We did this by designing new synthetic tasks, for which the network’s task was to map images to their mean ...

It sounds like Greek to me

Performance of phonetic representations for language identification

This paper compares the performance of two phonetic notations, IPA and ASJPcode, with the alphabetical notation for word-level language identification. Two machine learning models, a Multilayer Percerptron and a Logistic Regression model, are used to classify words using each o ...
Currently, trained machine learning models are readily available, but their training data might not be (for example due to privacy reasons). This thesis investigates how pre-trained models can be combined for performance on all their source domains, without access to data. This p ...
Active learning has the potential to reduce labeling costs in terms of time and money. In practical use, active learning works as an efficient data labeling strategy. Another point of view to look at active learning is to consider active learning as a learning problem, where the ...

Photovoltaic Yield Nowcasting

For Residential Solar Systems in the Netherlands Using a Machine Learning Approach

An increasing number of photovoltaic (PV) systems are being installed worldwide and residential sector is responsible for a large part of this growth. Small scale PV systems do not have complex measuring devices and their breakdowns are not spotted immediately by the system owner ...
The core challenge of the BedBasedEcho BEP project is to create an algorithm to find the heart, and apply it on a robotic echocardiography solution. The team has found multiple complex solutions that are related to this problem, and has extracted useful information from these sol ...
Records from ledgers of Dutch companies all across the Netherlands are used in this study. Records can be submitted in the ledgers with various lags, because the data of many different bookkeepers is involved with different workflows. Bookkeepers can be punctual or late, therefor ...

In recent years many new text generation models have been developed while evaluation of text generation remains a considerable challenge.  Currently, the only metric that is able to fully capture the quality of a generated text is human evaluation, which is e ...