The global climate change crisis and the associated phenomenon of global warming have taken center stage in recent years. Greenhouse gas emissions due to electricity generation are a contributor to this problem. Internet Services running in data centers consume enormous amounts o
...
The global climate change crisis and the associated phenomenon of global warming have taken center stage in recent years. Greenhouse gas emissions due to electricity generation are a contributor to this problem. Internet Services running in data centers consume enormous amounts of energy and should be optimized to reduce their greenhouse gas emissions.
This thesis explores the possibility of intelligently scheduling resource-intensive batch data-processing jobs to green energy generation hours in the day. Green hours are hours within the day during which the amount of greenhouse gasses emitted per kilowatt-hour (kWh) is lower compared to other hours of the day. There is a variance in the amount of emissions due to the variability of renewable energy generation and grid demand.
The system "S.C.A.L.E. (Scheduler for Carbon-Aware Load Execution)" is proposed. It schedules compute jobs to periods of low-carbon-intense energy generation based on predictions of renewable energy generation and grid demand. The system was evaluated against a simulated data processing pipeline at ING; this pipeline is one of the larger consumers of the ING private cloud. The scheduler aims to reduce greenhouse gas emissions by intelligently predicting task running times and green hours for the next day and optimizing the times at which tasks are processed throughout the day.
Several main conclusions are drawn based on this research:
1. The accuracy of task load predictions regarding running times is crucial for effective scheduling. The research concludes that, with sufficient historical data, the scheduler can predict task running times with an acceptable margin of error (5-10%).
2. The research explores the scheduler's ability to predict periods of low carbon intensity and the resulting reduction in carbon emissions by implementing it. The research affirms the scheduler's accuracy in determining low-carbon-intensive energy generation periods and estimates a potential 20% reduction in greenhouse gas emissions.
3. The potential overhead introduced by implementing a carbon-aware scheduler is addressed. The research identifies that while the scheduling algorithm itself is lightweight, the concurrent processing of tasks introduces overhead. The tipping point, where the overhead outweighs the benefits, varies for each system and should be experimentally determined.
The thesis concludes by emphasizing the significance of implementing a carbon-aware scheduler to reduce the environmental impact of data centers. The proposed scheduler is a promising contribution to sustainable computing practices. Further, the research suggests the need for continued work and adoption of the scheduler into production environments, especially within the context of the ING data processing pipeline.