HPC and PC: Two Views At The Same Thing?
{jcomments on}Since roughly 2006 the clock rate of processors has effectively not changed anymore – instead, manufacturers were forced to move towards integrating multiple compute units into a single processor, due to the manufacturing costs and power issues involved in higher-speed processors. This movement will not only impact on the specialised large scale sector (i.e. high performance computing), but will also strongly influence the end-user “low-end” market: developers from all areas can no longer rely on system performance improvements with the introduction of the next generation systems, but instead have to start exploiting some form of parallelism for their applications . The desktop machine therefore essentially follows the transition towards cluster machines, which used to be exclusive for high-end supercomputing tasks.
It would be generally wrong, though, to assume that this holds true across all levels, i.e. that handling applications and systems of n-core desktop machines can thus be treated in the same way as high performance computing machines. Instead, a set of essential differences have to be respected, which mostly arise from the different usage background between these two systems:
Specifics of HPC machines
Though this may not hold true as a generality, high performance computing machines essentially are dedicated to specific computational (algorithmic) tasks – in other words, the underlying hardware and operating system model were devised particularly for a specific set of computational problems, which typically occupy a research and / or development domain, such as medicine (drug testing), automotive (airflow simulation), etc. – IBM’s Blue Gene cluster is a good example for such application specialisation. However, there are multiple instances of machines notable which pursue a broader range of applicability – even though the according usage scope is still below the general purpose principles pursued by common desktop PCs.
With the main purpose behind supercomputers being execution of complex, large scale computation algorithms, efficiency (in the sense of execution time per operations (flops)) is a primary concern in all application of HPC. In fact, a substantial amount of time is vested into improving the system and software architecture to improve calculation speed, or more importantly calculation precision within the same time scope. In general, much more resources are vested into HPC than any other computing machine – this covers the full range from hardware development, over system maintenance to coding of applications. Implicitly, cluster machines have stronger devices, including cache size and speed, interconnect latency and bandwidth, clock rate etc., but also implicitly energy consumption. Such dedication to the whole process is only possible due to the high industrial interest in the results of HPC applications.
Accordingly, development of efficient, scalable applications is not comparable to general-purpose development. HPC application development typically requires dedicated experts that have extensive mathematical knowledge about the underlying algorithm, as well as the hardware they are developing for, so as to make maximum use of the system. This comes at the cost of portability and often enough, coders will develop an application they can only really test on the HPC machine. Similarly, the operating system handling the machine is typically specifically adapted to the respective hardware setup, so as to reduce additional overhead for any general purpose or concurrency management issues, and to employ dedicated hardware drivers.
As cost is of secondary concern in this environment, it can be generally stated that an application will reserve as much of the system resources as possible exclusively. In other words, most HPC applications will run on a dedicated number of cores without competing with other processes over the available resources (cores). These processes also show hardly any interactivity with the user beyond provisioning of data sets for computation – typically the application will run completely isolated from the user environment for a pre-defined period of time and the user will only re-connect to collect the computation results.
Summary: dedicated hardware, more efficient equipment, exclusive execution, no interactivity, higher development effort, expert coders, adapted operating system, limited portability
Specifics of large scale desktop machines
Even though the general hardware architecture principles are effectively identical between current cluster and desktop machines, it must be noted that, obviously, desktop PC are (and always will be) developed for the consumer mass market. Implicitly, cost (ranging from manufacturing, over code development to energy consumption) is an essential aspect in this area.
The major driver for desktop PCs however consists in the general purpose applicability of the machine: as opposed to the HPC case, the user does not want a specialised machine for each of his / her individual use cases, which range over a broad scope of usage areas. Similarly to HPC, these application areas imply different requirements towards the underlying operating system and hardware, if they were to be executed at maximum performance – for example, games require high graphics power, office applications large cache and fast hard-drive connects, multimedia applications fast stream processors but little cache etc. The effective system architecture is thereby dictated by the cut-off between cost, generality and performance.
In general, efficiency comes secondary to generality in these systems and usability is a main aspect for end-users as well as developers for such environments. The according applications aim strongly at interactivity rather than computational performance, even though demanding application types (such as video editors) gain in popularity. As not only the cost for the system, but also for the applications, and hence for their development needs to be comparatively low, the user scope does not allow for dedicated, complex coding effort. The applications must thus aim at a broad user scope and keep developing cost at a minimum, which accordingly implies that the software can be easily ported across different hardware architectures.
This applies in particular to the operating system which needs to cope with a large scope of different infrastructures whilst maintaining a maximum of usability and reliability. It may thereby be noted that a major distinguishing factor between Linux and Microsoft consists in the usability and general applicability between the two systems, explaining many of the implicit performance and reliability issues. What is more, the operating system must enable that multiple applications competing for the same resource can be executed in (seemingly) parallel – in a common desktop environment, a user will run 4-8 different applications at the same time, sometimes multiple instances of them, plus around 30-50 supporting background services.
Summary: high concurrency, generality and large usage scope, high interactivity, fast and easy development, general purpose operating system, maximum portability, efficiency is secondary
So what about convergence?
Obviously desktop machines and high performance computers have to address some essentially different challenges when it comes to the scale of the underlying system. Whilst HPC machines can expect dedicated applications and environments that have to cater for isolated processes, common PCs will particularly struggle with the heterogeneity of system and software, and the strong concurrency during execution. However, the degree of heterogeneity and portability will need to increase in HPC environments in order to meet the scope of development and usage requirements (in particular in the area of coupled applications) . Also, the average performance and capabilities of the actual compute unit (core) will become more and more similar between HPC and at least high-end desktop machines – this already applies to GPGPUs and a wide scope of GPPs.
A major challenge and hence obstacle to coping with this development consists in the scalability of the operating system and thus the effective execution environment. Whilst HPC systems so far could count on sufficient resources to host an operating system per compute unit, the limitations of individual cores in a many- / multi-core processor do not allow for such an approach anymore – no matter whether in an HPC or a desktop PC environment. And even though the major distinguishing criteria will remain concurrency during execution, as well as the degree of scalability exposed by the running applications, a stronger convergence can be noted than may initially be expected – this however will mostly be decided by the architecture of the operating system and whether it will be able to address a potentially unlimited scale and resource heterogeneity scope. Concurrency management, scheduling and synchronisation can then be regarded as special use cases depending on the application domain.
It is obvious that with the emergence of “home HPC” and large-scale systems the time of the powerful monolithic operating systems draws to a close, and future execution environments will in particular have to be lightweight and adaptable to enable effective execution across parallel infrastructures.