Highly scalable monitoring, regression and compliance tests
The main challenge for this workpackage is to provide a component for the collection of data and metrics regarding the status of the overall computing system. This monitoring component has to be extremely scalable, causing minimal load on the target systems, being robust, highly available because other component will automatically act based on the monitored data. In the conceptiual phase up to PM6 different tools will be tested to validate their usability for TIMaCS. These tests contain existing mmonitoring solutions (i.e. Ganglia and Nagios) as well as in-house developments of the onvolved partners i.e. tools and procedures from ZIH and HLRS to ckeck the compute hardware for certain minimal requirements.
The components cover high efficient (hierarchical and aggregating metrics) and failsafe (Robin-Round databases) monitoring solutions running under production conditions as tools managing automatic maintenance or preventive detecting potential problems on the systems. Additionally concepts for managing historic data - the basis for regular regression tests especially after system changes (hardware and software) - will be integrated