Acknowledgment. The work is supported by the Doctoral Research Startup Foundation of University of South China (No. 200XQD083).
机构署名:
本校为第一机构
院系归属:
计算机科学与技术学院
摘要:
Data locality is a key factor influencing the performance of Spark systems. As the execution container of tasks, the executors started on which nodes can directly affect the locality level achieved by the tasks. This paper tries to improve the data locality by executor allocation in reduce stage for Spark framework. Firstly, we calculate the network distance matrix of executors and formulate an optimal executor allocation problem to minimize the total communication distance. Then, an approximation algorithm is proposed and the approximate facto...