Promising more-efficient data centers, DPUs add another element to the heterogeneous processing mix. DPUs are important to data-center disaggregation, allowing server processors to perform only compute tasks while the DPU handles data movement between networked compute and storage.
Several vendors now offer processors positioned as DPUs. After examining the product field, The Linley Group defines a DPU as a programmable network SoC that integrates all major functions from the network ports to the PCI Express (PCIe) interface. A high-bandwidth PCIe interface separates DPUs from programmable Ethernet switch chips as well as legacy embedded processors. Combined with an integrated data plane for high-rate packet processing, the PCIe interface suits DPUs to network-traffic termination in smart NICs and for connecting SSDs in storage-controller cards.
Portability requires developers to use high-level APIs, avoiding any dependencies on underlying hardware. Conversely, adopting DPUs has required custom low-level code, creating a barrier to application developers.
With DOCA, Nvidia aims to remove this obstacle by providing a higher level of abstraction for DPU programming. By providing runtime binaries and high-level APIs, the framework allows developers to focus on application code rather than learning DPU-hardware intricacies.
For AI, there are similar tensions between running code on an x86 server processor and accelerating it using optimized hardware such as a GPU. Despite increasing competition, Nvidia remains the leader in AI acceleration due in part to the maturity and breadth of its CUDA software. Open-source neural-network frameworks essentially use CUDA as the default solution for acceleration.
This paper provides an in-depth discussion of the software-related issues and solutions.