Home / Technology

Technology - Distributed Symmetric Multiprocessing

Symmetric Multi-Processing (SMP) supercomputers with large shared memory and single system image, are perfect for Big Data and high-performance computing applications with large in-memory needs. The Symmetric Computing Departmental Supercomputers are large shared-memory, many-core, SMP computers. To programmers, they present as a single large memory Linux box with hundreds or thousands of cores. Programmers can use standard threading packages to get access to all the cores and all the memory. With our supercomputers, programmers can avoid message passing interface (MPI) programming, making it simpler to port applications to a supercomputer. There is also no need to build complex file-access program components; programmers can just read a big dataset into memory and access it directly.

Our Departmental Supercomputers are built from a cluster of state-of-the-art, industry standard server hardware, connected by Infiniband. Deparmental Supercomputers come with a standard Linux OS distribution that includes our patented Distributed Symmetric Multi-Processing (DSMP) extensions to the Linux kernel. The DSMP Linux kernel transforms the cluster into a single system image, shared memory, many core supercomputer. This is done without the need of expensive proprietary hardware or using a slower, much less efficient hypervisor implementations. In the DSMP implementation, one node is configured as the head node, providing a user interface. The other nodes are worker nodes. A process launched on the head node can take advantage of the memory and cores on worker nodes using the SMP programming model.

Essentially, DSMP transforms Linux into a cluster operating system. This is accomplished by two key DSMP extensions to the Linux kernel. They are transactional distributed shared memory (DSM) and a new kernel based global Pthreads implementation. Transactional DSM utilizes a two tiered memory organization, consisting of a local memory partition on each node, and an additional global memory partition on a subset of nodes. The global memory partitions combine to form a single global memory that is addressable by every node. Global memory pages are swapped into local memory partitions on nodes as needed by executing program threads. The consistency of global memory is maintained with the aid of a set of memory page locking functions that are available to application programs. A generalized view of this architecture is illustrated in the following diagram.

The global Pthreads implemetation works by creating a corresponding process clone on every worker node when a process that uses global Pthreads is launched on the head node. The process clones contain pagetables that are duplicates of the original process. This guarantees that any process thread running on a worker node will share the same global memory address space. In this manner a true SMP program execution model is implemented across a cluster. The following diagram illustrates a process P1 running with multiple threads T1 to T8 executing on multipe nodes, all sharing the same address space.

 

The DSMP Linux kernel extenstions provide large shared memory, many core SMP computing with both economy and performance. For further information on how DSMP works, please refer to our whitepaper.

Download the White Paper describing our DSMP Linux Kernel Extension.