Scheduling Workloads of Workflows in Clusters and Clouds
More Info
expand_more
Abstract
This dissertation addresses three key challenges that are characteristic to the online scheduling of workloads of workflows in modern distributed computing systems. The first challenge is the realistic estimation of the resource demand of a workflow, as it is important for making good task placement and resource allocation decisions. Usually, workflows consist of segments with different parallelism and different interconnection types between tasks which affect the order how the tasks become eligible. Moreover, realistic task runtime estimates are not always available. The second challenge is the efficient placement of workflow tasks on computing resources for minimizing average workflow slowdown while achieving fairness. A wrongly chosen task placement policy can easily degrade the performance and negatively affect the fair access of workflows to computing resources. The third challenge is the automatic allocation (autoscaling) of computing resources for workflows while meeting deadline and budget constraints. Computing clouds make it possible to easily lease and release resources. Such decisions should be made wisely to minimize slowdowns and deadline violations, and to efficiently use the leased resources to reduce incurred costs. To address these challenges, this dissertation proposes novel scheduling policies for workloads of workflows and investigates the applicability of relevant state-of-the-art policies to the online scenario. For new policies, implementation effort and suitability for production systems are kept in mind. The considered workflow scheduling policies are experimentally evaluated by conducting a wide set of simulation-based and real-world experiments on a private multicluster computer. Additionally, a Mixed Integer Programming (MIP) approach is used to validate the obtained real-world experimental results versus the optimal solution from a MIP solver.