Reinventing the Network Stack for Compute-Intensive Applications
October 1, 2019 | DARPAEstimated reading time: 4 minutes
Computing performance has steadily increased against the trajectory set by Moore’s Law, and networking performance has accelerated at a similar rate. Despite these connected evolutions in network and server technology however, the network stack, starting with the network interface card (NIC)—or the hardware that bridges the network/server boundary—has not kept pace.
Image Caption: The chart represents data rates on a vertical log scale, with an optical fiber on the left and a server on the right. Movement from left to right traces the path data must take through the components from a fiber to a server. Network stacks are limited both by network interface cards and system software to 10-100 gigabits per second. This bottleneck is especially important for distributed computation that requires significant communication between the computation nodes. FastNICs seeks to speed up applications, such as the distributed training of machine learning classifiers, by 100x through the development, implementation, integration, and validation of novel, clean-slate network subsystems.
Today, network interface hardware is hampering data ingest from the network to processing hardware. Additional factors, such as limitations in server memory technologies, memory copying, poor application design, and competition for shared resources, has resulted in network subsystems that are creating a bottleneck within the network stack and are throttling application throughput.
“The true bottleneck for processor throughput is the network interface used to connect a machine to an external network, such as an Ethernet, therefore severely limiting a processor’s data ingest capability,” said Dr. Jonathan Smith, a program manager in DARPA’s Information Innovation Office (I2O). “Today, network throughput on state-of-the-art technology is about 1014 bits per second (bps) and data is processed in aggregate at about 1014 bps. Current stacks deliver only about 1010 to 1011 bps application throughputs.”
Addressing the bottleneck between multiprocessor servers and the network links that interconnect them is increasingly critical for distributed computing. This class of computing requires significant communication between computation nodes. It is also increasingly relied on for advanced applications such as deep neural network training and image classification.
To accelerate distributed applications and close the yawning performance gap, DARPA initiated the Fast Network Interface Cards (FastNICs) program. FastNICs seeks to improve network stack performance by a factor of 100 through the creation of clean-slate networking approaches. Enabling this significant performance gain will require a rework of the entire network stack—from the application layer through the system software layer, down to the hardware.
“There is a lot of expense and complexity involved in building a network stack—from maximizing connections across hardware and software to reworking the application interfaces. Strong commercial incentives focused on cautious incremental technology advances across multiple, independent market silos have dissuaded anyone from addressing the stack as a whole,” said Smith.
To help justify the need for this significant overhaul, the FastNICs programs will select a challenge application and provide it with the hardware support it needs, operating system software, and application interfaces that will enable an overall system acceleration that comes from having faster NICs. Under the program, researchers will work to develop, implement, integrate, and validate novel, clean-slate network subsystems.
Part of FastNICs will focus on developing hardware systems to significantly improve aggregate raw server datapath speed. Within this research area, researchers will design, implement, and demonstrate 10 Tbps network interface hardware using existing or road-mapped hardware interfaces. The hardware solutions must attach to servers via one or more industry-standard interface points, such as I/O buses, multiprocessor interconnection networks, and memory slots, to support the rapid transition of FastNICs technology. “It starts with the hardware; if you cannot get that right, you are stuck. Software can’t make things faster than the physical layer will allow so we have to first change the physical layer,” said Smith.
A second research area will focus on developing system software required to manage the FastNICs hardware resources. To realize 100x throughput gains at the application level, system software must enable efficient and parallel transfer of data between the network hardware and other elements of the system. FastNICs researchers will work to generate software libraries—all of which will be open source, and compatible with at least one open source OS—that are usable by various applications.
FastNICs will also explore applications that could be enabled by the multiple order of magnitude performance increases provided by the program-generated hardware. Researchers will aim to design and implement at least one application that demonstrates a 100x speedup when executed on the novel hardware/software stack, providing a validator for the program’s primary objective. There are two application areas of particular interest—distributed machine learning and sensors. Machine learning requires the harnessing of clusters—or large numbers of machines—so that all cores are employed for a single purpose, like analyzing imagery to help self-driving cars appropriately identify an obstacle in the road. “Recent research has shown that by speeding up the network support, the entire distributed machine learning system can operate more quickly. With machine learning, the methods typically used involve moving data around, which creates delays. However, if you can move data more quickly between machines with a successful FastNICs result then you should be able to shrink the performance gap,” said Smith.
FastNICs will also explore sensor data from systems like UAVs and overhead imagers. An example application would be change detection where tagged images are used to train a deep learning system to recognize anomalies in a time series of image captures, such as the presence of a strange structure, or a sudden spurt in activity at facilities in an inexplicable location. Change detection requires quick access to both current sensor data as well as the ability to rapidly access archives of data. FastNICs will provide a way of accelerating the acquisition of actionable intelligence from a mountain of data.
Suggested Items
I-Connect007 Editor’s Choice: Five Must-Reads for the Week
05/03/2024 | Nolan Johnson, I-Connect007This week’s most important news is strategic—and telling. When one puts together the IPC industry reports, we simply have to include the recent conversation with Shawn DuBravac and Tom Kastner. On the design side, check out the latest “On The Line With…” podcast featuring Brad Griffin from Cadence Design Systems, discussing SI and PI in the realm of intelligent system design.
Industrial PC Market Size to Record $1.75 Billion Growth from 2023-2027
05/03/2024 | PRNewswireThe global industrial pc market size is estimated to grow by USD 1.75 billion from 2023 to 2027, according to Technavio. This growth is expected to occur at a Compound Annual Growth Rate (CAGR) of almost 6.29% during the forecast period.
Real Time with… IPC APEX EXPO 2024: Sigma Engineering's Recycling and Regeneration Systems for PCB Etching
05/02/2024 | Real Time with...IPC APEX EXPOEvan Howard of Schmoll America interviews Kristoffer Bjorklund, Sigma Engineering's supply chain manager. We learn about Sigma's recycling and regeneration systems for PCB industry etching and the benefits and challenges of implementing these systems in existing factories.
Boeing T-7A Red Hawk Triples Progress
05/01/2024 | BoeingThe Boeing T-7A Red Hawk achieved three recent milestones, propelling the advanced pilot trainer for the U.S. Air Force forward.
Merlin Flex invests in New Schmoll Direct Imaging System
04/30/2024 | Merlin Flex LtdMerlin Flex has fully installed and commissioned its 2nd Schmoll MDI Direct Imaging system. This new machine includes a twin bed, 4 head system which enhances Merlin Flex’s direct imaging capability for its 1.4M long flexible circuits.