Multiple MapReduce Jobs in Distributed Scheduler for Big Data Applications

Kasi Perumal Sundaraj; Madhusudhan Rao T; Praveen Chander P G

doi:10.23956/ijarcsse.v7i12.484

Multiple MapReduce Jobs in Distributed Scheduler for Big Data Applications

Kasi Perumal Sundaraj, Madhusudhan Rao T, Praveen Chander P G

Abstract

The majority of large-scale data intensive applications executed by data centers are based on MapReduce or its open-source implementation, Hadoop. Such applications are executed on large clusters requiring large amounts of energy, making the energy costs a considerable fraction of the data center’s overall costs. Therefore minimizing the energy consumption when executing each MapReduce job is a critical concern for data centers. In this paper, we propose a framework for improving the energy efficiency of MapReduce applications, while satisfying the service level agreement (SLA).We first model the problemof energy-aware scheduling of a single MapReduce job as an Integer Program. We then propose two heuristic algorithms, called energy-aware MapReduce scheduling algorithms (EMRSA-I and EMRSA-II), that find the assignments of map and reduce tasks to the machine slots in order to minimize the energy consumed when executing the application. Our algorithm able to find near optimal job schedules consuming approximately 40 percent less energy on average than the schedules obtained by a common practice scheduler that minimizes the makespan.

Full Text:

PDF

References

J. Koomey, “Growth in data center electricity use 2005 to 2010,” vol. 1. Oakland, CA, USA: Press, Aug. 2011.

J. Hamilton, “Cooperative expendable micro-slice servers (CEMS): Low cost, low power servers for internet-scale services,” in Proc. Conf. Innovative Data Syst. Res., 2009.

J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in Proc. 6th USENIX Symp. Oper. Syst. Des. Implementation, 2004, pp. 137–150.

Hadoop. (2014) [Online]. Available: http://hadoop.apache.org/[5] M. Zaharia, D. Borthakur, J. S. Sarma, K. Elmeleegy, S. Shenker, and I. Stoica, “Job scheduling for multi-user MapReduce clusters,” Electrical Eng. Comput. Sci. Dept., Univ. California, Berkeley, CA, USA, Tech. Rep. UCB/EECS-2009-55, Apr. 2009.

Apc. (2014) [Online]. Available: http://www.apc.com/

J. Polo, C. Castillo, D. Carrera, Y. Becerra, I. Whalley, M. Steinder, J. Torres, and E. Ayguad_e, “Resource-aware adaptive scheduling for mapreduce clusters,” in Proc. 12th ACM/IFIP/USENIX Int. Middleware Conf., 2011, pp. 187–207.

A. Verma, L. Cherkasova, and R. H. Campbell, “ARIA: Automatic resource inference and allocation for MapReduce environments,” in Proc. 8th ACM Int. Conf. Autonomic Comput., 2011, pp. 235–244.

A. Verma, L. Cherkasova, and R. H. Campbell, “Two sides of a coin: Optimizing the schedule of MapReduce jobs to minimize their makespan and improve cluster performance,” in Proc. IEEE 20th Int. Symp. Model., Anal. Simul. Comput. Telecommun. Syst.,

, pp. 11–18.

B. Palanisamy, A. Singh, L. Liu, and B. Jain, “Purlieus: Localityaware resource allocation for MapReduce in a cloud,” in Proc. Conf. High Perform. Comput., Netw., Storage Anal., 2011.

B. Moseley, A. Dasgupta, R. Kumar, and T. Sarl_os, “On scheduling in map-reduce and flow-shops,” in Proc. 23rd Annu. ACM Symp.

Parallelism Algorithms Archit., 2011, pp. 289–298.

H. Chang, M. S. Kodialam, R. R. Kompella, T. V. Lakshman, M. Lee, and S. Mukherjee, “Scheduling in MapReduce-like systems for fast completion time,” in Proc. IEEE 30th Int. Conf. Comput. Commun., 2011, pp. 3074–3082.

F. Chen, M. S. Kodialam, and T. V. Lakshman, “Joint scheduling of processing and shuffle phases in MapReduce systems,” in Proc. IEEE Conf. Comput. Commun., 2012, pp. 1143–1151.

Y. Zheng, N. B. Shroff, and P. Sinha, “A new analytical technique for designing provably efficient MapReduce schedulers,” in Proc.

IEEE Conf. Comput. Commun., 2013, pp. 1600–1608.

T. J. Hacker and K. Mahadik, “Flexible resource allocation for reliable virtual cluster computing systems,” in Proc. Int. Conf. High

Perform. Comp., Netw., Storage Anal., 2011, pp. 1–12.

B. Palanisamy, A. Singh, and L. Liu, “Cost-effective resource provisioning for MapReduce in a cloud,” IEEE Trans. Parallel Distrib. Syst., no. 1, pp. 1, PrePrints, 2014.

J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox, “Twister: A runtime for iterative MapReduce,” in Proc. 19th ACM Int. Symp. High Perform. Distrib. Comput., 2010, pp. 81

DOI: https://doi.org/10.23956/ijarcsse.v7i12.484

Refbacks

There are currently no refbacks.

International Journal of Advanced Research in Computer Science and Software Engineering

CALL FOR SPECIAL ISSUE

PUBLICATION FEE

Multiple MapReduce Jobs in Distributed Scheduler for Big Data Applications

Abstract

Full Text:

References

Refbacks

Featured sites

Username
Password
Remember me