MF
M. Fragkoulis
27 records found
1
Although the cloud has reached a state of robustness, the burden of using its resources falls on the shoulders of programmers who struggle to keep up with ever-growing cloud infrastructure services and abstractions. As a result, state management, scaling, operation, and failure m
...
While the concept of large-scale stream processing is very popular nowadays, efficient dynamic allocation of resources is still an open issue in the area. The database research community has yet to evaluate different autoscaling techniques for stream processing engines under a ro
...
Stream processing in the last decade has seen broad adoption in both commercial and research settings. One key element for this success is the ability of modern stream processors to handle failures while ensuring exactly-once processing guarantees. At the moment of writing, virtu
...
Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamen
...
In this work, we evaluate autoscaling solutions for stream processing engines. Although autoscaling has become a mainstream subject of research in the last decade, the database research community has yet to evaluate different autoscaling techniques under a proper benchmarking set
...
While there are multiple approaches for distributed application programming (e.g., Bloom [2], Hilda [14], Cloudburst [12], AWS Lambda, Azure Durable Functions, and Orleans [3, 4]), in practice developers mainly use libraries of popular general purpose languages such as Spring Boo
...
Sequential recommendation problems have received increasing attention in research during the past few years, leading to the inception of a large variety of algorithmic approaches. In this work, we explore how large language models (LLMs), which are nowadays introducing disruptive
...
S-QUERY
Opening the Black Box of Internal Stream Processor State
Distributed streaming dataflow systems have evolved into scalable and fault-tolerant production-grade systems. Their applicability has departed from the mere analysis of streaming windows and complex-event processing, and now includes cloud applications and machine learning infer
...
Machine Learning (ML) applications require high-quality datasets. Automated data augmentation techniques can help increase the richness of training data, thus increasing the ML model accuracy. Existing solutions focus on efficiency and ML model accuracy but do not exploit the ric
...
Serverless computing is currently the fastest-growing cloud services segment. The most prominent serverless offering is Function-as-a-Service (FaaS), where users write functions and the cloud automates deployment, maintenance, and scalability. Although FaaS is a good fit for exec
...
Clonos
Consistent Causal Recovery for Highly-Available Streaming Dataflows
Stream processing lies in the backbone of modern businesses, being employed for mission critical applications such as real-time fraud detection, car-trip fare calculations, traffic management, and stock trading. Large-scale applications are executed by scale-out stream processing
...
Valentine in Action
Matching Tabular Data at Scale
Capturing relationships among heterogeneous datasets in large data lakes - traditionally termed schema matching - is one of the most challenging problems that corporations and institutions face nowadays. Discovering and integrating datasets heavily relies on the effectiveness of
...
Hazelcast jet
Low-latency stream processing at the 99.99th percentile
Jet is an open-source, high-performance, distributed stream processor built at Hazelcast during the last five years. Jet was engineered with millisecond latency on the 99.99th percentile as its primary design goal. Originally Jet’s purpose was to be an execution engine that perfo
...
Serverless computing is currently the fastest-growing cloud services segment. The most prominent serverless offering is Function-as-a-Service (FaaS), where users write functions and the cloud automates deployment, maintenance, and scalability. Although FaaS is a good fit for exec
...
Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema
...
Beyond Analytics
The Evolution of Stream Processing Systems
Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. The goal of this tutorial is threefold. First, we aim to
...
REMA
Graph embeddings-based relational schema matching
Schema matching is the process of capturing correspondence between attributes of different datasets and it is one of the most important prerequisite steps for analyzing heterogeneous data collections. State-of-the-art schema matching algorithms that use simple schema- or instance
...
Memory operations are critical to an application's reliability and performance. To reason about their correctness and track opportunities for optimisations, sophisticated instrumentation frameworks, such as Valgrind and Pin, have been developed. Both provide only limited faciliti
...
Operational stream processing
Towards scalable and consistent event-driven applications
In the last decade we are witnessing a widespread adoption of architectural styles such as microservices, for building event-driven software applications and deploying them in cloud infrastructures. Such services favor the separation of a database into independent silos of data,
...
Smelly relations
Measuring and understanding database schema quality
Context: Databases are an integral element of enterprise applications. Similarly to code, database schemas are also prone to smells - best practice violations. Objective: We aim to explore database schema quality, associated characteristics and their relationships with other soft
...