| 12 | | == A Dynamic MapReduce Scheduler for Heterogeneous Workloads == |
| | 12 | * A Dynamic MapReduce Scheduler for Heterogeneous Workloads |
| | 13 | |
| | 14 | Abstract—MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a |
| | 15 | practical data center of that scale, it is a common case that I/Obound |
| | 16 | jobs and CPU-bound jobs, which demand different |
| | 17 | resources, run simultaneously in the same cluster. In the |
| | 18 | MapReduce framework, parallelization of these two kinds of job |
| | 19 | has not been concerned. In this paper, we give a new view of the |
| | 20 | MapReduce model, and classify the MapReduce workloads into |
| | 21 | three categories based on their CPU and I/O utilization. With |
| | 22 | workload classification, we design a new dynamic MapReduce |
| | 23 | workload predict mechanism, MR-Predict, which detects the |
| | 24 | workload type on the fly. We propose a Triple-Queue Scheduler |
| | 25 | based on the MR-Predict mechanism. The Triple-Queue |
| | 26 | scheduler could improve the usage of both CPU and disk I/O |
| | 27 | resources under heterogeneous workloads. And it could improve |
| | 28 | the Hadoop throughput by about 30% under heterogeneous |
| | 29 | workloads. |
| | 30 | |