Advent of OSCAR Monitoring Framework

Benefits to OSCAR:
OSCAR is no doubt very popular as a cluster manager in HPC environments. This comes as no surprise when its high configurability and flexibility with regards to supported linux distributions is taken into account. Alas, installing a cluster resource manager is only the first step in managing a cluster; initiatives have to be employed to ensure that individual nodes in the cluster are somewhat resilient to failures. Incorporating a framework that allows monitoring of services running on the various nodes and storage of the information gathered using the OSCAR database will ensure a more robust OSCAR.

Project Synopsis:
Currently OSCAR can install a cluster, perform managerial tasks such as addition/deletion of nodes and also monitor the status of the cluster with ganglia or nagios. HA-OSCAR, an extension of OSCAR introduces redundancy at the head-node level by duplicating the primary head-node and based on predefined policies carries out specific actions to guarantee availability of this head-node. OSCAR cannot monitor the states of services concurrently running on all compute nodes such as lam , pbs_mom and take predefined actions in the case of failures. I propose the design, implementation and integration of a universal framework that allows gathering and storing information about the health of the various clustering services.

Project Details:
I will implement and integrate into OSCAR a simple universal framework that will be responsible for gathering information regarding the health of the cluster and storing this information in the OSCAR Database. Such an implementation will ensure that should a fault/error arise from any monitored service, there will be a mechanism to report this failure and based on user defined policies, actions can be taken to mitigate the effect of the failed service. The actual design of such a universal framework will be done in conjunction with my mentor while the implementation and integration into OSCAR core will be done by me.

~ by Chuka on June 9, 2008.

One Response to “Advent of OSCAR Monitoring Framework”

  1. [...] those not familiar with my project here is a description of the goals of the project and here are the milestones achieved so [...]

Leave a Reply