Project News

Resource discovery for manycore systems

Print
Created on Tuesday, 31 January 2012

Modern compute infrastructures and processor architectures expose a wide range of accelerators with different properties to the developer, but also to the Operating System, both of which cannot handle this complexity efficiently and assign the right code to the best-suited processing unit. A means to identify the processing units according to the code requirements is therefore required.

Future generations of the processors will not be able to enhance their single-thread performance exponentially. Instead, they will scale the number of processing cores. In consequence, application software will no longer get faster execution speeds automatically with each hardware upgrade, but will have to be adapted to the higher level of parallelism exposed by the CPU. It means that to use the benefit of the many-core improvement in hardware we should upgrade our traditional concepts of applications, operating systems and compilers towards massively distributed environments. We can envision that computing nodes, with thousands of cores, may be connected together to form a single transparent computing unit, thus hiding the complexity and distributed nature of the many core system from applications.

For distributed operating systems executing on such many core environments, resource discovery is a vital building block to exploit highest capabilities of all distributed resources. It is required to recognize and locate resources with deep understanding of the resource capabilities according to the application requirements. In a large scale system where we have a pool of variable processors, the enabling technology for enhancing the whole throughput of the system is resource sharing. This means that for the overloaded processors we could potentially migrate the overloaded processes to other processors in the network. But before resource sharing, resource allocation and execution migration we need to find resources and locate them. Resource discovery as a component of a distributed OS will be employed to discover an efficient set of available processing resources which are matched with the application requirements. The discovery latency has direct effect on the cost of migration, and execution migration is not beneficial in the case that resource discovery could not provide information services in a reasonable time. Unlike resource discovery for various purpose and other domains, resource discovery for large many core environment is very sensitive to the discovery performance and it could be useless when it could not satisfying the minimal parametric conditions of the system environment.

Grid Systems

In Grid systems, the resource discovery models have been proposed to enable a grid information service. They can be categorized according to their main approach to the problem: centralized and distributed. The simplest approach to create an information service is the first: employ a centralized directory. The major advantage of these solutions is the simplicity of finding all resource information on the central server, making the resource discovery latency low, and data coherence high. However these approaches suffer from sub-optimal scalability and lower fault tolerance, mostly due to the centralized nature of the directories. Another approach for discovery in grids relies in hierarchically organized servers. This approach offers a scalable information service named Monitoring and Discovery System (MDS). In MDS, a grid is composed by several resource description providers that are registered to index servers. Resource requesters query directory nodes to discover resource index servers, and to obtain more detailed resource information from their resource description providers. The index servers also follow an hierarchy. The top index server answers requests either directly or by dispatching requests to its child index servers. This approach limits scalability, as requests trickle through the root server, which can easily become a bottleneck and consequently suffer from fault tolerance issues. Indeed, the loss of a node in the higher level of the architecture causes the loss of an entire sub tree. Distributed RD approaches have been specially designed to provide a high level of scalability and fault tolerance, which is required in large scale environments. The combination of P2P and Grid RD models would be desirable to build fault tolerant and large scale distributed systems. There are two kind of approaches in this field which are based on structured and unstructured overlays networks.

Peer to Peer Systems

The other common model for resource discovery are the resource discovery systems for Peer-To-Peer (P2P) networks which offers a significant advantage over their hierarchal counterparts by the way of resistance to failure and traffic congestion. In particular, structured P2P systems based on Distributed Hash Tables (DHTs) are very popular for file-sharing applications but not for sharing resource information. Moreover, typical structured P2P systems are very sensitive to churning leading to resource unavailability. These systems achieve good performances and scalability characteristics and their hashing functionalities performs well with static attributes, however they need to be enhanced for handling dynamic objects appropriately.With respect to scalability in response time and traffic load, the performance of the P2P structured systems is better than unstructured systems, this relies in the fact that Distributed Hash Tables (DHTs) are more scalable, load balanced and self-organizing than pure peer to peer overlay networks. Furthermore distributed hash tables have capability to efficiently support range queries inherited from their data locality property. In other respects, maintenance of the structured system in high dynamic Grid environments can be more difficult, where the availability and status of resources vary significantly over time. Periodical update of each peer in the system is costly and resulting to either increased discovery system overhead (when the time of period is too short), or stale information or lack of system accuracy (when the time of period is too long).Unstructured systems, on the contrary, employ diverse mechanisms and strategies to facilitate the status updating of the resources with limited network load, including techniques such as: experience-based query forwarding, message buffering and merging, routing indexes, and super-peer architectures.

Future Many-Core Systems

In the future large scale many-core environment we expect resource discovery component to fulfill the requirements such as: scalability, efficiency, adaptability and also support for heterogeneity and dynamicity. Supporting very dynamic environments is important, since we assume that network nodes can join and leave an organization at any time, and the availability and status of resources within each node change dynamically over time. Scalability is one of the basic problem of resources discovery in many-core environments which is a generic challenge for most of the research works in the area of resource discovery. The scalability problems refer to the methods for description of resources and the discovery procedure. These mechanisms must propose techniques and algorithms that efficiently be extend-able for various number of resources. On the other hand the generated discovery overhead must be independent from network size. This is not fully attainable but we can have an effort to keep the discovery overhead almost constant with increasing the number of resources. Another fundamental requirement of resource discovery in future many core systems is the flexibility of querying which means the ability to perform multi dimensional (attribute), exact matching, partial matching, key-based, point-based and range-based queries.

In a large scale system that we have a pool of variable processors, the enabling technology for enhancing the whole throughput of the system is resource sharing which means that for the overloaded processors we can migrate the overloaded processes to other potential processors in the network. But before resource sharing, resource allocation and execution migration, we need to find resources and locate them. Resource discovery as a component of a distributed OS will be employed to discover an efficient set of available processing resources which are matched with the application requirements. The discovery latency has direct effect on the cost of migration and execution migration is not beneficial when resource discovery cannot provide information services in the reasonable time. Unlike resources discovery for various purpose and other domains, resource discovery for large many core environment is so sensitive to the discovery performance and it could be useless when it cannot satisfy the minimal parametric conditions of the system environment.

Thursday the 17th. Sponsored under FP7-ICT-2009.8.1, Grant Agreement No. 248465. This website is monitored by Google Analytics. IP addresses are anonymized.
Copyright 2012

©