JH

J.J. Hoozemans

23 records found

Authored

Many applications make extensive use of various forms of compression techniques for storing and communicating data. As decompression is highly regular and repetitive, it is a suitable candidate for acceleration. Examples are offloading (de)compression to a dedicated circuit on ...

FPGA Acceleration for Big Data Analytics

Challenges and Opportunities

The big data revolution has ushered an era with ever increasing volumes and complexity of data requiring ever faster computational analysis. During this very same era, CPU performance growth has been stagnating, pushing the industry to either scale their computation horizontall ...

We propose a novel reconfigurable hardware architecture to implement Monte Carlo based simulation of physical dose accumulation for intensity-modulated adaptive radiotherapy. The long term goal of our effort is to provide accurate dose calculation in real-time during patient t ...

ALMARVI Execution Platform

Heterogeneous Video Processing SoC Platform on FPGA

The proliferation of processing hardware alternatives allows developers to use various customized computing platforms to run their applications in an optimal way. However, porting application code on custom hardware requires a lot of development and porting effort. This paper des ...

This paper presents and evaluates an approach to deploy image and video processing pipelines that are developed frame-oriented on a hardware platform that is stream-oriented, such as an FPGA. First, this calls for a specialized streaming memory hierarchy and accompanying softw ...

We propose a novel reconfigurable hardware architecture to implement Monte Carlo based simulation of physical dose accumulation for intensity-modulated adaptive radiotherapy. The long term goal of our effort is to provide accurate online dose calculation in real-time during pa ...

In the design of modern-day processors, energy consumption and fault tolerance have gained significant importance next to performance. This is caused by battery constraints, thermal design limits, and higher susceptibility to errors as transistor feature sizes are decreasing. How ...
To achieve energy savings while maintaining adequate performance, system designers and programmers wish to create the best possible match between program behavior and the underlying hardware. Well-known current approaches include DVFS and task migrations in heterogeneous platform ...
Embedded systems range from very simple devices, such as a digital watch, to highly complex systems such as smartphones. In these complex devices, an increasing number of applications need to be executed on a computing platform. Moreover, the number of applications (or programs) ...
In this paper, we present and evaluate an FPGA acceleration fabric that uses VLIW softcores as processing elements, combined with a
memory hierarchy that is designed to stream data between intermediate stages of an image processing pipeline. These pipelines are commonplace in ...
In today’s computing environments, the concurrent execution of multiple applications/threads is common and multi-cores are very
well-suited to handle such workloads. However, they suffer from the fact that any mismatch between the application’s inherent instruction-level para ...
Abstract—Multi-threaded applications execute their threads on different cores with their own local caches and need to share data among the threads. Shared caches are used to avoid lengthy and costly main memory accesses. The degree of cache sharing is a balance between reducing m ...
As embedded systems are faced with ever more demanding workloads and more tasks are being consolidated onto a smaller number of microcontrollers, system designers are faced with opposing requirements of increasing performance while retaining real-time analyzability. For example, ...

Very Long Instruction Word (VLIW) processors are commonplace in embedded systems due to their inherent lowpower consumption as the instruction scheduling is performed by the compiler instead by sophisticated and power-hungry hardware instruction schedulers used in their RISC c ...

The register file is an expensive component in the design of any processor, especially, when considering the additional ports that are needed to support multiple datapaths within a wide-issue VLIW processor. In a recent work, these additional resources were used to dynamically ...

Contributed

An increase in the performance of mobile devices has started a revolution in deploying artificial intelligence (AI) algorithms on mobile and embedded systems. In addition, fueled by the need for privacy-aware insights into data, we see a strong push towards federated machine lear ...
Through new digital business models, the importance of big data analytics continuously grows. Initially, data analytics clusters were mainly bounded by the throughput of network links and the performance of I/O operations. With current hardware development, this has changed, and ...
Big data applications are becoming more commonplace due to an abundance of digital data and increasingly powerful hardware. One of these classes of hardware devices are FPGAs, which are being used today in various ways such as data centers and embedded systems. High performance, ...