![]() |
|||||
|
|
Gigabit-per-second networking, which used to be associated with high-performance computers and supercomputer centers, is coming into the mainstream. Today, system area networks connect commercial servers to form clusters that increase availability and fault tolerance. The characteristic features of a system area network are high bandwidth, low latency, and low error rates. Unlocking the true potential of these high-performance system area networks requires a completely new approach to networking. The transition from supercomputer backplanes to system area networks was originally motivated by the desire to increase the high-performance computing potential of networks of workstations. Linking desktop computers with system area networks resulted in a low-cost platform for high-performance computing. This trend is accelerating as the performance of current inexpensive desktop computers often rivals that of expensive workstations. Today computational scientists are connecting large collections of mass-marketed PCs by 100 megabit-per-second (Mbps) switched Fast Ethernet and are beginning to incorporate 1000 Mbps speeds using Gigabit Ethernet switches. They run parallel programs based on message passing on machines running Linux or Windows NT. For selected applications, they find substantial performance-per-cost gains by using equipment sold in large volume. The conventional networking protocols used on most machines today work well with 10 Mbps Ethernet. Nevertheless, they severely limit the bandwidth of faster networking hardware. Receiving a message requires copying the data into buffers in the operating system kernel and then copying the data into the address space of the application. During this process, the processor is consumed doing networking rather than the intended computation. The high-performance cluster computing community solved these problems with a completely new approach to networking. It involved changes to both the hardware and software layers using techniques developed for supercomputers. This article discusses two of the most important techniques: message coprocessors and user-level networking. Since 1995 under the High Performance Commodity Computing (HPCC) project, MITRE has been experimenting with the Myrinet system area network from Myricom. This gigabit networking technology grew out of past high-bandwidth scalable networking initiatives at DARPA that produced machines such as the Intel Paragon, a massively parallel multicomputer. A Paragon supercomputer and a Myrinet linked cluster share a common approach to high-bandwidth, low-latency networking. Packets contain headers that are used by simple, extremely fast switches to perform routing. The switching hardware also provides flow control. Each computation node has a processor dedicated to handling messages with direct access to main memory. With the Myrinet, the network interface card contains a processor, memory, direct memory access hardware, and it plugs into a PCI bus slot of an ordinary PC. The processor on the Myrinet interface card shields the main processor from interruptions generated by the Myrinet, and the direct memory access hardware can transfer data directly into physical memory. If the operating system has set up virtual memory so that this physical memory maps into the address space of the application receiving a message, then there is no need to copy the message within main memory. Message passing without a context switch into the kernel is the essence of user-level networking. The use of message coprocessors and user-level networking allow high-bandwidth, low-latency networking between applications. These techniques will become commonplace in the future because of recent standardization efforts led by Intel, Microsoft, and Compaq. Motivated by the desire to cluster together inexpensive symmetric multiprocessors into larger, more capable enterprise server systems, these companies have recently announced the Virtual Interface Architecture (VI Arch) initiative. In this application domain, both high bandwidth and low latency are critical for acceptable performance. The Virtual Interface Architecture Specification describes a low-level, hardware-independent software messaging layer tuned for system area networks. VI Arch messaging is designed to operate asynchronously with the main computation to allow the off-loading of message processing onto a coprocessor. Furthermore, VI Arch messaging is connection-oriented, and it requires that the destination buffer for a message be allocated in advance. The buffer must also be locked into physical memory. Network interface cards may copy data directly into the address space of the application receiving the message, knowing that the target address range is mapped into physical memory. The clustered server market will drive shipments of system area networks that conform to the VI Arch Specification. Competition will drive the prices so low that it will be common to link desktop machines by system area networks. This in turn will have a dramatic impact on the use of commodity clusters for high performance computing. For more information, please contact Richard Games using the employee directory. |
Solutions That Make a Difference.® |
|
|