New AI accelerator developed for lightweight AI models and embedded processor technology

27-02-2024 | Renesas | Semiconductors

Renesas Electronics Corporation has announced the development of embedded processor technology that allows higher speeds and lower power consumption in MPUs that realise advanced vision AI. The newly developed technologies are a DRP-based AI accelerator that efficiently processes lightweight AI models and heterogeneous architecture technology that allows real-time processing by cooperatively operating processor IPs, such as the CPU. The company produced a prototype of an embedded AI-MPU with these technologies and confirmed its high-speed and low-power-consumption operation. It achieved up to 16 times faster processing (130 TOPS) than before introducing these new technologies and world-class power efficiency (up to 23.9 TOPS/W at 0.8V supply).

Amid the recent growth of robots in factories, logistics, medical services, and stores, there is an expanding demand for systems that can autonomously run in real-time by detecting surroundings using advanced vision AI. Since there are severe restrictions on heat generation, particularly for embedded devices, higher performance and lower power consumption are needed in AI chips. The company developed new technologies to meet these needs.

The technologies developed by Renesas are an AI accelerator that efficiently processes lightweight AI models. As a typical technology for improving AI processing efficiency, pruning is obtainable to omit calculations that do not greatly affect recognition accuracy. However, it is common that calculations that do not affect recognition accuracy randomly exist in AI models. This generates a difference between the parallelism of hardware processing and the randomness of pruning, which makes processing inefficient.

The company optimised its unique DRP-based AI accelerator (DRP-AI) for pruning to solve this issue. By analysing how pruning pattern characteristics and a pruning method are related to recognition accuracy in typical image recognition AI models (CNN models), it identified the hardware structure of an AI accelerator that can achieve high recognition accuracy and an efficient pruning rate and applied it to the DRP-AI design. Also, software was developed to lower the weight of AI models optimised for this DRP-AI. This software converts the random pruning model configuration into highly efficient parallel computing, resulting in higher-speed AI processing. In particular, the company's highly flexible pruning support technology (flexible N:M pruning technology), which can dynamically alter the number of cycles in response to changes in the local pruning rate in AI models, permits for fine control of the pruning rate according to the power consumption, operating speed, and recognition accuracy required by users.

This technology decreases the number of AI model processing cycles to as little as one-sixteenth of pruning incompatible models and consumes less than one-eighth of the power.

Heterogeneous architecture technology enables real-time processing for robot control, and robot applications need advanced vision AI processing to recognise the surrounding environment. Meanwhile, robot motion judgement and control need detailed condition programming in response to changes in the surrounding environment, so CPU-based software processing is more suitable than AI-based processing. The challenge has been that CPUs with current embedded processors cannot control robots in real-time. That is why it introduced a DRP, which handles complex processing, in addition to the CPU and AI accelerator (DRP-AI). This led to the development of heterogeneous architecture technology that allows higher speeds and lower power consumption in AI-MPUs by distributing and parallelising processes appropriately.

A DRP runs an application while dynamically modifying the circuit connection configuration between the arithmetic units inside the chip for each operation clock according to the processing details. Lower power consumption and higher speeds are possible since only the required arithmetic circuits operate even for complex processing. For example, SLAM, one of the typical robot applications, is a complex configuration that needs multiple programming processes for robot position recognition in parallel with environment recognition by vision AI processing. Renesas demonstrated operating this SLAM through instantaneous program switching with the DRP and parallel operation of the AI accelerator and CPU, resulting in about 17 times faster operation speeds and about 12 times higher operating power efficiency than the embedded CPU alone.

The company created a prototype of a test chip with these technologies and confirmed that it attained the world-class, highest power efficiency of 23.9 TOPS per watt at a normal power voltage of 0.8V for the AI accelerator and operating power efficiency of 10 TOPS per watt for major AI models. It also demonstrated that AI processing is possible without a fan or heat sink.

Using these results helps solve heat generation due to increased power consumption, which has been one of the challenges associated with implementing AI chips in various embedded devices such as service robots and automated guided vehicles. Greatly lowering heat generation will contribute to the spread of automation into multiple industries, such as the robotics and smart technology markets. These technologies will be applied to the company's RZ/V series – MPUs for vision AI applications.


By Seb Springall

Seb Springall is a seasoned editor at Electropages, specialising in the product news sections. With a keen eye for the latest advancements in the tech industry, Seb curates and oversees content that highlights cutting-edge technologies and market trends.