Banner
       



Publication Details PU-17 - `Efficient task replication and management for adaptive fault tolerance in Mobile Grid environments´


February 2007; Antonios Litke, Dimitrios Skoutas, Konstantinos Tserpes, Theodora Varvarigou

Abstract:
Fault tolerant Grid computing is of vital importance as the Grid and Mobile computing worlds converge to the Mobile Grid computing paradigm. We present an efficient scheme based on task replication, which utilizes the Weibull reliability function for the Grid resources so as to estimate the number of replicas that are going to be scheduled in order to guarantee a specific fault tolerance level for the Grid environment. The additional workload that is produced by the replication is handled by a resource management scheme which is based on the knapsack formulation and which aims to maximize the utilization and profit of the Grid infrastructure. The proposed model has been evaluated through simulation and has shown its efficiency for being used in a middleware approach in future mobile Grid environments.

Keywords:
fault tolerance, scheduling, Grid computing

References:
[1] M.R. Lyu, Software Fault Tolerance, JohnWiley & Sons, Chichester, UK, 1995.
[2] A. Nguyen-Tuong, Integrating fault-tolerance techniques in Grid applications, Ph.D. Dissertation, University of Virginia, August 2000.
[3] K. Ramamritham, J.A. Stankovic, P.-F. Shiah, Efficient scheduling algorithms for real-time multiprocessor systems, IEEE Trans. Parallel Distrib. Syst. 1 (2) (1990) 184–194.
[4] Scheduling Working Group of the Grid Forum, Document: 10.5, September 2001.
[5] F. Wang, K. Ramamritham, J.A. Stankovic, Determining redundancy levels for fault tolerant real-time systems, IEEE Trans. Comput. 44 (February) (1995).
[6] D. Pisinger, Algorithms for Knapsack problems, Ph.D. Thesis, Dept. of Computer Science, University of Copenhagen, February 1995.
[7] J.B. Weissman, Fault Tolerant Computing on the Grid: What are My Options? HPDC (1999).
[8] S. Hwang, C. Kesselman, A flexible framework for fault tolerance in the grid, J. Grid Comput. 1 (2003) 251–272.
[9] The Globus project, http://www-fp.globus.org/hbm/.
[10] A. Nguyen-Tuong, A.S. Grimshaw, Using reflection to incorporate fault-tolerance techniques in distributed applications, Computer Science Technical Report, University of Virginia, CS 98-34, 1998.
[11] S.J. Chapin, D. Katramatos, J. Karpovich, A. Grimshaw, Resource management in Legion, Future Gener. Comput. Syst. 15 (5–6) (1999) 583–594.
[12] H. Casanova, J. Dongarra, C. Johnson, M. Miller, Application-specific tools, in: I. Foster, C. Kesselman (Eds.), The GRID: Blueprint for a New Computing Infrastructure, 1998, pp. 159–180 (Chapter 7).
[13] J.S. Plank, H. Casanova, M. Beck, J.J. Dongarra, Deploying fault tolerance and task migration with NetSolve, Future Gener. Comput. Syst. 15 (5) (1999) 745–755.
[14] A.S. Grimshaw, A. Ferrari, E.A. West, Mentat, in: G.V. Wilson, P. Lu (Eds.), Parallel Programming Using C++, 1996, pp. 382–427 (Chapter 10).
[15] F.C. Gartner, Fundamentals of fault-tolerant distributed computing in asynchronous environments, ACM Comput. Surv. 31 (1) (1999).
[16] Q. Chen, M. Ferris, J.T. Linderoth, FATCOP 2.0: Advanced features in an opportunistic mixed integer programming solver, Ann. Oper. Res. 103 (2001) 17–32.
[17] H. Zhuge, X. Sun, J. Liu, E. Yao, X. Chen, A scalable P2P platform for the knowledge grid, IEEE Trans. Knowl. Data Eng. 17 (12) (2005) 1721–1736.
[18] Q. Lv, P. Cao, E. Cohen, K. Li, S. Shenker, Search and replication in unstructured peer-to-peer networks, in: Proc. 16th Int’l Conf. Supercomputing, June 2002, pp. 84–95.
[19] Access to Knowledge through the Grid in a MobileWorld (AKOGRIMO) Integrated Project FP6-2003-IST-004293, http://www.akogrimo.org/.
[20] A. Litke, D. Skoutas, T. Varvarigou, Mobile Grid computing: Changes and challenges of resource management in a mobile grid environment, in: Access to Knowledge through the Grid in a Mobile World Workshop, held in conjunction with 5th Int. Conf. on Practical Aspects of Knowledge Management, PAKM 2004, Vienna. Available at http://www.mobilegrids.org/docs/pakm2004/papers/3 Mobile-Grid-Computing.pdf.
[21] D. Abramson, R. Sosic, J. Giddy, B. Hall, Nimrod: A tool for performing parametised simulations using distributed workstations, in: The 4th IEEE Symposium on High Performance Distributed Computing, Virginia, August 1995.
[22] D. Abramson, R. Buyya, J. Giddy, A computational economy for grid computing and its implementation in the Nimrod-G resource broker, Future Gener. Comput. Syst. 18 (8) (2002) 1061–1074.
[23] F. Berman, A. Chien, K. Cooper, J. Dongarra, I. Foster, D. Gannon, L. Johnsson, K. Kennedy, C. Kesselman, J. Mellor-Crummey, D. Reed, L. Torczon, R. Wolski, The GrADS Project: Software support for high-level grid application development, Int. J. High Perform. Comput. Appl. 15 (4) (2001) 327–344. Winter.
[24] J. Frey, T. Tannenbaum, I. Foster, M. Livny, S. Tuecke, Condor-G: A computation management agent for multiinstitutional grids, Cluster Comput. 5 (2002) 237–246.
[25] M. Faerman, S. Figueira, J. Hayes, G. Obertelli, J. Schopf, G. Shao, S. Smallen, N. Spring, A. Su, D. Zagorodnov, Adaptive computing on the grid using AppLeS, IEEE Trans. Parallel Distrib. Syst. 14 (4) (2003) 369–382.
[26] C. Weng, X. Lu, Heuristic scheduling for bag-of-tasks applications in combination with QoS in the computational grid, Future Gener. Comput. Syst. 21 (2005) 271–280.
[27] V. Subramani, R. Kettimuthu, S. Srinivasan, P. Sadayappan, Distributed job scheduling on computational grids using multiple simultaneous requests, in: Proc. of the 11th IEEE International Symposium on High Performance Distributed Computing, Edinburgh, Scotland, July 2002, pp. 359–367.
[28] The Unicore project, http://www.unicore.org/forum.htm.
[29] R. Parra-Hernandez, D. Vanderster, N.J. Dimopoulos, Resource management and Knapsack formulations on the grid, in: Proc. of the 5th IEEE/ACM Int. Workshop on Grid Computing, GRID’04.
[30] J.Y.-T. Leung, M.L. Merrill, A note on preemptive, scheduling of periodic, real-time tasks, Inform. Process. Lett. (November) (1980) 115–118.
[31] L.E. Jackson, G.N. Rouskas, Deterministic preemptive scheduling of real time tasks, IEEE Comput. 35 (5) (2002) 72–79.
[32] A.S. Tanenbaum, M. van Steen, Distributed Systems: Principles and Paradigms, 1st ed., Prenctice Hall, Computer Science.
[33] T. Varvarigou, J. Trotter, Module replication for fault-tolerant real-time distributed systems, IEEE Trans. Reliab. 47 (1) (1998) 8–18.
[34] D.A. Reed, C. Lu, C.L. Mendes, Reliability challenges in large systems, Future Gener. Comput. Syst. 22 (2006) 293–302.
[35] D. Nurmi, J. Brevik, R. Wolski, Modeling machine availability in enterprise and wide-area distributed computing environments, UCSB Computer Science Technical Report Number CS2003-28.
[36] D.C. Montgomery, G.C. Runger, Applied Statistics and Probability for Engineers, 3rd ed. (an Interactive e-text).
[37] P.L. Meyer, Introductory Probability and Statistical Applications, 2nd ed., Addison-Wesley, 1970 (Chapter 11).
[38] R.L. Scheaffer, Introduction to Probability and Its Applications, 2nd ed., Duxbury, 1995 (Section 4.9).
[39] R.B. Abernethy, The New Weibull Handbook, 4th ed., ISBN 0-9653062-1-6.
[40] N. Doulamis, A. Doulamis, A. Panagakis, K. Dolkas, T. Varvarigou, E. Varvarigos, A combined fuzzy-neural network model for non-linear prediction of 3D rendering workload in grid computing, IEEE Trans. Syst. Man Cybern. B. (2004).
[41] X. He, X.-H. Sun, Gregor von Laszewski, QoS guided min-min heuristic for grid task scheduling, in: Grid Computing, J. Comput. Sci. Technol. 18 (4) (2003) (special issue).
[42] L. Carrington, A. Snavely, N. Wolter, A performance prediction framework for scientific applications, Future Gener. Comput. Syst. 22 (2006) 336–346.
[43] R. Wolski, N. Spring, J. Hayes, The network weather service: A distributed resource performance forecasting service for metacomputing, Future Gener. Comput. Syst. 15 (1999) 757–768.
[44] L. Gong, X.H. Sun, E. Waston, Performance modeling and prediction of non-dedicated network computing, IEEE Trans. Comput. 51 (9) (2002).
[45] Y. Gao, H. Rong, J.Z. Huang, Adaptive grid job scheduling with genetic algorithms, Future Gener. Comput. Syst. 21 (2005) 151–161.
[46] A. Litke, K. Tserpes, T. Varvarigou, Computational workload prediction for Grid oriented industrial applications: The case of 3D-image rendering, in: Proceedings of Cluster Computing and Grid, CCGrid2005, vol. 2, 2005, pp. 962–969.
[47] D. Pisinger, Algorithms for knapsack problems, Ph.D. Thesis, Dept. of Computer Science, University of Copenhagen, February 1995.
[48] A. Litke, K. Tserpes, K. Dolkas, T. Varvarigou, A task replication and fair resource management scheme for fault tolerant grids, in: Advances in Grid Computing—EGC 2005, in: Lecture Notes in Computer Science, vol. 3470, 2005, pp. 1022–1031.
[49] S. Martello, D. Pisinger, P. Toth, New trends in exact algorithms for the 0-1 knapsack problem, Eur. J. Oper. Res. 123 (2000) 325–332.
[50] S. Martello, P. Toth, Knapsack Problems: Algorithms and Computer Implementations, John Wiley & Sons, 1990.
[51] D. Ghosh, Heuristics for knapsack problems: Comparative survey and sensitivity analysis, Fellowship Dissertation, IIM Calcutta, India, 1997.
[52] G. Ghosh, B. Goldengorin, The binary knapsack problem: Solutions with guaranteed quality, Research Report 01A64, University of Groningen, 2001.
[53] P.C. Chu, J.E. Beasley, A genetic algorithm for the multidimensional knapsack problem, J. Heuristics 4 (1) (1998) 63–86.
[54] C. Cotta, J.M. Troya, A hybrid genetic algorithm for the 0-1 multiple knapsack problem, in: G.D. Smith, N.C. Steele, R.F. Albrecht (Eds.), Artificial Neural Nets and Genetic Algorithms 3, Springer-Verlag, 1998, pp. 251–255.
[55] A.B. Sim˜oes, E. Costa, An evolutionary approach to the zero/one Knapsack problem: Testing ideas from biology, in: Proc. of the Fifth Int. Conf. on Neural Networks and Genetic Algorithms, ICANNGA’01, Prague, April 2001, pp. 236–239.
[56] S. Sahni, Data Structures, Algorithms, and Applications in Java, 2nd ed., Silicon Press, 2004.
[57] B.R. Preiss, Data Structures and Algorithms with Object-Oriented Design Patterns in C++, John Wiley & Sons, 1999.
[58] The Open Grid Services Architecture, Version 1.0, http://www.ggf.org/documents/GWD-I-E/GFD-I.030.pdf.
[59] OASIS Web Services Resource Framework (WSRF), http://www.oasisopen.org.
[60] Simple Object Access Protocol, http://www.w3.org/TR/soap/.


Source:
Future Generation Computer Systems, ISSN 0167-739X, volume 23, issue 2, pages 163-178 (Elsevier)

URL:
commercial (foreign link)

Syndicate our news.