Renesas today released a new 64-bit MPU, the RZ/V2H, that brings significant upgrades to the Renesas RZ/V family for edge AI. Edge AI refers to AI processing that takes place on the edge of the connected network. The edge lies in embedded and local processing systems, such as security devices, home robots, and appliances; in contrast, cloud AI processing is carried out in central server farms.
The new RZ/V2H for edge AI applications
Renesas’ RZ/V family of microprocessors all come with multiple cores and optimization for local machine vision processing. The new RZ/V2H considerably ups the ante with additional processor cores, faster processing, and lower power consumption, delivering up to 80 tera operations per second (TOPs) on the Resnet50—a 50-layer deep convolutional neural network architecture.
Renesas 64-bit RZ/V vision AL MPU product line
It also includes AI classification models at efficiencies of up to 10 TOPs per watt. For comparison, the prior MPUs in the family top out at 1 TOP and 28 fps with Resnet50.
Up to 10 Processing Cores
The RZ/V2H comes with a quad-core Cortex-A55, a dual-core Cortex-R8, a Cortex-M33, a Renesas DRP, a Renesas DRP-AI, and, optionally, a hardware image signal processor (ISP). The A55 typically runs Linux for overall system control and processing. The quad-core allows the MPU to support four simultaneous 4K resolution cameras at up to 830 fps. Real-time operations such as motor control and communications can be offloaded to the R8. The 200-MHz M33 handles background tasks such as power management and wake-up tasks.
RZ/V2H application diagram.
The hardware ISP is the first stop for image data from the cameras. Because some cameras have their own ISP, Renesas offers the RX/V2H with or without the HW ISP section. Finally, the DRP and DRP-AI provide OpenCV compatibility and perform much of the AI processing. This makes for an incredibly capable processor option for powering smart home and office appliances, industrial safety products, and infrastructure devices. It lets the device perform AI operations such as image recognition, identification, and decision making without having to call back to the server for anything but the most complex AI operations. More information can be found in the RX/V2H datasheet.
Higher Computing Performance, Lower Electrical Power
When small packages dictate low heat and power budgets, it’s not enough to add cores or crank up clock speeds to increase performance with brute force. The RZ/V2H does not disappoint in this regard. It has the cores and clock speed, but it also uses careful optimization to add capability while reducing power.
Renesas’ proprietary dynamically reprogrammable processor (DRP) is a reconfigurable processing engine that can be adapted for different tasks. It’s flexible enough to change with every clock cycle if need be. In the RZ/V2H, the DRP system offloads pre- and post-processing. The DRP handles OpenCV compatibility while the DRP-AI, optimized for AI math, performs the heaviest AI computations. Renesas claims that the V3 is 10x more power efficient than the V2.
The matrix calculations now use INT8 data quantization instead of FP16 used in DRP-AI V2. Dropping from floating point to an integer may seem like a big compromise, but in this application, it is not. The MPU uses a Resnet50 model, which maintains high accuracy with INT8 data. Moving to INT8 delivers a 14x increase in math performance. At the same time, it halves the data density resulting in 2x better power efficiency than FP16.
Unstructured Pruning in the DRP
A process called unstructured pruning brings further performance increases and power reductions. An AI model looks at weighted values, with larger values having more significance and a greater likelihood of leading to a match. Pruning takes all the weights that are near zero and sets them to zero.
FP16 with no pruning in earlier MPU versions (left) and INT8 with sparse pruning in the RZ/V2H (right).
Since the result would be zero with no value toward the ultimate result, the DRP-AI engine skips any operation with a weight equaling zero. Leaving the zero weights in is called dense pruning, while skipping them is referred to as sparse pruning. This makes the AI model smaller, faster, and more efficient, for about 5x improvement in power efficiency on top of the 2x gained with the switch from FP16 to INT8.
According to Renesas benchmarks, the combination of INT8 and sparse pruning significantly increases performance over common competitive offerings.
Competitive benchmarks illustrating performance advantages of RZ/V2H
Dense pruning delivers twice the performance, while sparse pruning takes it further to about a 3x increase in AI performance. In a typical application where a device must process video, identify objects, and make decisions based on the observations, the AI computation speed in the processor translates directly to the speed of the device operations. Robots can move faster, alarms can be triggered faster, and any necessary responses can start sooner.
No Fans Needed
The DRP and DRP-AI increase performance and significantly reduce power consumption compared to competitive GPU-based embedded AI. In a robotic vacuum use case, the RZ/V2H delivers 13 fps for AI simultaneous localization and mapping (SLAM) at less than 4 W, negating the need for cooling fans in most applications. That means quieter operation and longer battery life. The RZ/V2H is available as a component and development board for purchase now from standard Renesas distribution channels.
All images used courtesy of Renesas