Abstract |
: |
There has been a lot of development in the field of clusters and grids. Recently, the use of clusters has been on rise in every possible field. This paper proposes a system that monitors jobs on large computational clusters. Monitoring jobs is essential to understand how jobs are being executed. This helps us in understanding the complete life cycle of the jobs being executed on large clusters. Also, this paper describes how the information obtained by monitoring the jobs would help in increasing the overall throughput of clusters. Heuristics help in efficient job distribution among the computational nodes, thereby accomplishing fair job distribution policy. The proposed system would be capable of load balancing among the computational nodes, detecting failures, taking corrective actions after failure detection, job monitoring, system resource monitoring, etc. |