Home / Technology

Technology - Distributed Symmetric Multiprocessing

Symmetric Computing Departmental Supercomputers and Departmental Mainframes are large shared-memory, many-core, SMP computers. To programmers, they present as a single large memory Linux box with hundreds or thousands of cores. Programmers can use the standard Pthreads threading package to get access to all the cores and all the memory. Large datasets can be read into memory and accessed directly. Often programmers can avoid message passing interface (MPI) and complex file-access programming. This makes it simpler to port applications to a supercomputer.

Our Departmental Supercomputers consist of a cluster of state-of-the-art, industry standard server hardware, connected by Infiniband. These supecomputing clusters come with a standard Linux OS distribution that includes our patented Distributed Symmetric Multi-Processing (DSMP) extensions to the Linux kernel. The DSMP Linux kernel transforms the cluster into a single system image, shared memory, many core supercomputer. This is done without the need of expensive proprietary hardware or using a slower, much less efficient hypervisor implementation. In the DSMP implementation, one node is configured as the head node, providing a user interface. The other nodes are worker nodes. A process launched on the head node can take advantage of the memory and cores on worker nodes, by using the SMP programming model.

DSMP adds two key extensions to the Linux kernel. They are transactional distributed shared memory (DSM) and a new kernel based global Pthreads implementation. Transactional DSM utilizes a two tiered memory organization, consisting of a local memory partition on each node, and an additional global memory partition on a subset of nodes. The global memory partitions combine to form a single global memory that is addressable by every node. Global memory pages are swapped into local memory partitions on nodes as needed by executing program threads. Modified pages are copied back to global memory. The consistency of global memory is maintained with the aid of a set of memory page locking functions that are available to application programs. A generalized view of this architecture is illustrated in the following diagram.

The global Pthreads implemetation works by creating a corresponding process clone on every worker node when a process that uses global Pthreads is launched on the head node. The process clones contain pagetables that are duplicates of the original process. This guarantees that any process thread running on a worker node will share the same global memory address space. In this manner a true SMP program execution model is implemented across a cluster. The following diagram illustrates a process P1 running with multiple threads T1 to T8 executing on multipe nodes, all sharing the same address space.


The DSMP Linux kernel extenstions provide large shared memory, many core SMP computing with both economy and performance.
Download the White Paper describing our DSMP Linux Kernel Extension.