Results

Print
Created on Monday, 22 February 2010 Last Updated on Thursday, 13 June 2013 Written by Lutz Schubert

S(o)OS has concluded with a 3 months extension in April 2013. During this time, the project has undergone two major research cycles, the first of which focused on examining base approaches to deal with future large-scale heterogeneous environments. These base approaches were assessed by S(o)OS with the end of the first cycle to identify the necessary updates to be performed in cycle two of the project. Accordingly, the second cycle focused in particular on updating the first approaches and incorporating them into an integrated architecture of an operating system.

The project has there by identified in specifically that traditional monolithic operating systems are way too complex organised for large scale infrastructures, in particular of mixed hierarchies, such as is the case with multi-core processors in multi-node systems. In addition to this, such OS architectures also show little adaptability to heterogeneous hardware and in particular to specialised processors or cores that do not adhere to the x86 standard and / or do not perform optimally when used as such. What is more, due to the complex organisation, traditional OS generate a lot of overhead and unnecessary load on processing unit and cache that further reduces performance considerably. However, a similar problem arises with microkernels, that – non-regarding their reduction in size – still are complex organized and optimised towards their destination platform. This means that adaptation to new hardware is almost as complicated as for traditional operating systems.

#1 To remedy this, the S(o)OS project has generated a modular operating system architecture that can be split across a distributed and heterogeneous infrastructure thereby only occupying a minimum degree of cache space and reducing the overhead explicit and in particular implicit to execution state maintenance and control. The Service-oriented Operating System is thereby specifically geared towards applications with a large number of dedicated threads that are to be on machines with diverging architectures and hierarchies, i.e. with a strong heterogeneity in terms of processing units and interconnects.

Due to the architectural organisation, it is easy for the OS provider (or even developer) to adjust the Operating System to (a) new resource types and, more importantly, (b) to different usage case, and thus to different application needs. At the same time, the OS can be easily adjusted to different environments, exposing to the application developer a uniform view on the system, i.e. as if it would consist of a homogeneous infrastructure. As opposed to other frameworks or middleware though, S(o)OS does not automatically prescribe a (lowest) common protocol, but can choose the optimal protocols suitable to application case and infrastructure.

The current architecture is thereby geared towards single task execution, i.e. a single application per operating system instance and a single thread per resource. Whilst S(o)OS has indicated ways to extend the architecture towards multiple applications and multi-task execution, this is of secondary interest for dedicated HPC and real-time environments.

#2 In order to enable such distributed execution and allow for appropriate segmentation and distribution of the code according to (a) infrastructure and (b) code specific characteristics, the S(o)OS project also developed the means to distribute and adjust applications through a novel programming concept and an innovative hardware description model:

The programming concept allows for specification of data dependencies and indicating according cost in a fashion that the code can be easily split up using the S(o)OS analysis tools and can potentially even be rearranged for exploiting infrastructure characteristics even better. The implicitly generated dependency graph, in connection with the scheduler, also improves the resource utilisation considerably (see below).

The accordingly generated code and graph can then be mapped to the infrastructure using a Haskell based hardware description model that allows for easy description and query of resource and infrastructure characteristics, such as bandwidth, latency, computing performance etc. The approach differs from classical descriptions such as prescribed by WSDL, as that it really describes the hardware setup, allowing the requester to derive capabilities (such as vector capability) rather than having to explicitly state all potential usage features, and thus restricting future usability.

#3 The S(o)OS architecture thereby incorporates novel execution strategies that are geared towards large scale execution and, as noted above, low overhead. The S(o)OS concepts incorporate the means to exploit the data dependency graph in such a fashion that modules (threads) are executed according to their readiness, rather than their pre-defined schedule. The operating system can thereby conceptually deal with different types of data access, distribution etc., thus catering best for the according application needs (cf. #1).

By incorporating strategies from real-time environments, S(o)OS is furthermore able to respect timing boundaries, not only to execute large scale applications with time-bound behaviour, but also to assess and compare the impact of execution and communication times for scheduling purposes.

#4 S(o)OS could demonstrate the usefulness of this approach through dedicated reference scenarios and algorithmic implementations that were used to test the performance of the modules individually and together through or mutated environments. To this end, a logical simulator (SoOSiM) was developed by the project which allows verifying the logical behaviour of software architectures. More specifically, SoOSiM allows execution of software architectures under simulated load conditions without having a full implementation of the software at hand. With this simulator, it is hence possible to identify logical errors in the execution protocol, potential bottlenecks, idle module its. And, given that an average execution time frame is available, also provides indicators related to system performance.

Impact

Even though the project objectives were primarily research-oriented, i.e. not directly aiming at an implementation or realisation, the work nonetheless aimed at realising useful and usable results. The results listed above have various impacts on the worldwide HPC and IT community, even beyond pave academic and research interest:

Primarily, the growing pressure created by what is frequently referred to as “the end of Moore’s law” lets hard-and software developers alike search for solutions or at least means to reduce the impact of this development. The role of the operating system and the lacking information about data dependencies and algorithm intentions of the application in question have long been acknowledged by various industrial and academic researchers. The S(o)OS project not only substantiates these claims, but offers an initial approach to at least put the software side up to speed with the complexity currently arising from the hardware side.

Specifically, the project could show that an operating system by no means has to follow the traditional teaching that it has to provide all execution support in the form of a fully self-sustained software package. Instead, with the growing requirement for multi-threaded applications, execution support can be reduced to something like a set of extended libraries, or – as here – services. S(o)OS could furthermore show that at least within the realm of high performance computing, with a high demand towards performance and highly specialised applications / algorithms such a modularised, reduced approach offers considerable benefit to program execution and can even increase portability and heterogeneity.

The according results and findings have been published and disseminated extensively through various means, including in particular explicit communication with different industrial and academic efforts. All communication partners expressed great interest in the results, up to the extent where follow-up projects and collaboration groups without funding were initiated. Specifically worth mentioning are:

Further to this, the results have been incorporated in multiple lectures at the universities involved in the project, and given rise to multiple PhD topics that are and will be actively pursued. This means not only that the results contribute to shaping the next generation of researchers and IT specialists, but also that the research work is continued through dedicated student work.