Circular Image

10 records found

With the ever-increasing need to reduce the use of fossil fuels, Tesla is accelerating the world's transition to sustainable energy. This means replacing all internal combustion vehicles with electric ones over time. The growing number of Tesla vehicles on the road poses interest ...
The ability to accurately forecast sales volumes holds substantial significance for businesses. Current classical models struggle in capturing the impact of different variables upon the sales volume. These machine learning models are also not applicable to more than one specific ...
The integration of large-scale battery storage systems can aid the transition to renewable energy and stabilize energy systems for optimization. However, batteries can be cost-prohibitive and unprofitable, highlighting the need for a more comprehensive understanding and modelling ...

Evaluating Constant Failure Rates in Storm Surge Barriers

A Statistical Framework Applied to Censored Component Lifetimes of the Oosterscheldekering

This study examines the validity of constant failure rates in the reliability assessment of storm surge barriers, with a focus on the Stormvloedkering Oosterschelde (SVKO). Analysing a dataset of 1,501 malfunctions, including 87 critical incidents over six years, we employ Expone ...

Energy Study of Drying

Using Machine Learning to Predict the Energy Consumption of an Industrial Powder Drying Process

In this thesis, we use data science / statistical techniques to better understand the energy consumption behind a powder drying facility located in Zwolle, as part of Abbott's initiative to better manage its energy consumption. As powder drying is by far the facility's most energ ...
The Generalized Gamma Distribution (GGD) is a three-parameter distribution with desirable properties. For certain values of the parameters, the GGD can reduce to the gamma, exponential and lognormal distribution, among others. This makes it a flexible distribution that can be use ...

Improving data quality is of the utmost importance for any data-driven company, as data quality is unmistakably tied to business analytics and processes. One method to improve upon data quality is to restore missing and wrong data entries. 

Improving data quality is of the utmost importance for any data-driven company, as data quality is unmistakably tied to business analytics and processes. One method to improve upon data quality is to restore missing and wrong data entries. 

The goal of this research is construct an algorithm such that it is possible to restore missing and wrong data entries, while making use of a human adaptive framework. This algorithm has been constructed in a modular fashion and consists of three main modules: Data Transformation, Data Structure Analysis and Model Selection. Data Transformation has concerned itself with conversion of raw data to data types and forms the other modules can use.

Data Structure Analysis has been designed to deal with correctly missing data and dichotomy in the target feature by making use of three clustering algorithms: DBSCAN, K-Means and Diffusion Maps. DBSCAN is used to determine the necessity of clustering as well as the initialisation of the K-Means algorithm. K-Means and Diffusion Maps have been used as clustering methods in the one-dimensional target feature and the two-dimensional input-target feature pairs, respectively. Data Structure Analysis has further been designed to perform feature selection through three filter methods: CorrCoef, FCBF and Treelet.

Model Selection has proposed a novel approach to selection of the best model of a candidate set through the optimisation of a conditional model ranking strategy based on the prior construction of theoretical testing. Our candidate set consisted of Expectation Maximisation, K-Means, Multi-Layer Perceptron, Nearest Neighbor, Random Forest, Linear Regression, Polynomial Regression, ElasticNet Regression.

In terms of restorability, it was shown that the optimal configuration of the Cleansing Algorithm for the restoration of missing data, was provided by opting not to use clustering, using a custom alteration to the Treelet algorithm for feature selection and making use of the model selection strategy. This not only lead to the greatest restorability of 56.90% on Aegon data sets, which was an improvement of 44.83% when compared to not using the Cleansing Algorithm, but also to the reduction of computation time by over 400%. A more realistic restorability due to the presence of correctly missing data, was given by the same configuration making use of one-dimensional output clustering. This resulted in a restorability on Aegon data sets of 43.10%. As such it was deemed possible to restore missing data on Aegon data sets.

With respect to the human adaptive framework, it was determined that the construction of the algorithm be modular in the sense that any alternate feature selection or clustering approach can be implemented with ease. Furthermore, the model selection module allows us to customize the theoretical testing and choice of regression or classification models for the restoration of missing data. In doing so, the algorithm has laid the foundations for human adaptivity of the Cleansing Algorithm.

In my thesis I researched the potential paths and pitfalls of the newly created ``Taillardat index''.
This index uses the tail characteristics of several CRPS-based distributions to rank forecasters on how well they forecast, with a slight emphasis on extreme events.
From ...
The classical process capability indices are still the most prominently used by practitioners for asymmetrical tolerances even while not accurately reflecting on process capability. It appears that an adequate measure of capability for asymmetrical tolerances is yet to be discove ...
Since the gamma distribution is one of the most important models, and no convenient statistical tools for this distribution are available, the aim of this project is to construct an R package for the gamma distribution. In this package five functions are created, that can be used ...