27-11-2020 | | By Robin Mitchell
Recently, Cerebras announced their latest development of a wafer-scale chip that has the power of 10,000 GPUs. What is wafer-scale-integration, what advantages and disadvantages does it hold, and what is the announcement by Cerebras?
Terms such as LSI and VLSI describe the scale of circuit integration into silicon chips. As process technology gets smaller, the number of devices that can fit onto a single chip increase, and this leads to more powerful electronics. However, an emerging technology, called WSI, is beginning to gain traction in the semiconductor community as a potential alternative to current silicon devices.
WSI stands for wafer-scale-integration, and this means that instead of a wafer having thousands of identical chips which are separated and packaged, the entire wafer is a single device. Such technology has been explored and developed in the past since the 1970s, and the continual failures have led to many projects being scrapped, or forgotten. Even Clive Sinclair, the inventor of the ZX computer line (such as the ZX Spectrum), explored the idea of wafer-scale-integration as an alternative to individual dies.
The first and most obvious advantage to WSI is the sheer size of the resulting circuit. If a single die, measuring just 20mm x 20mm, can hold billions of transistors, then a 200mm wafer using 20nm technology could easily contain trillions of transistors. The processing power of such a die alone would trump any mainframe system, and all of this processing power would be packaged in a 200mm disc.
The second major advantage to such a design is that not only can an entire system be integrated (i.e. memory, controllers, busses, and processors), but the resultant design can be made more compact. This allows for greater data transfer speeds and could see a computational system operate without the need for caches or other layers that result in a slowdown.
However, WSI is no easy feat, and the failure of a single transistor on the wafer could result in the whole wafer going to waste. While this frequently happens when developing chips, the many thousands of chips on a wafer increase the yield of each wafer. From there, faulty chips can be marked as “No Good”, but the rest of the wafer is still useable. Considering that a single point defect, or improperly imaged gate of a MOSFET, can cause a failure, the chances of 1 trillion devices on a wafer all being manufactured correctly is unlikely.
Of course, this issue can be worked around using intelligent block-based designs whereby individual areas on a WSI device can be tested and enabled/disabled if faulty. From there, such a WSI device can be priced depending on how many failures are present, and this would reflect the overall number of functional processors, memory size, and speed. Another workaround is to use larger devices (100nm instead of 10nm), but the use of larger transistors can increase power consumption and decrease overall speed.
Last year, Cerebras announced their Cerebras CS-1 System, a WSI device that has the computational power of 10,000 GPUs, and a total of 1.2 trillion transistors. To put this device into perspective, the latest NVIDIA TeslaV100 has 21.1 billion transistors with a die size of 815mm2, and the CS-1 has a die size of 46,255mm2. The die’s main purpose is to help improve AI systems that utilise deep learning which involves the use of many GPU-like processors, called AI accelerators, all working in parallel. This can be achieved with the use of a supercomputer with many physical GPUs, but this suffers from performance slowdown when handing tasks to each GPU via wires and other connectors.
However, Cerebras and the Department of Energy (USA) have recently announced that their WSI system is not only faster than 10,000 GPUs, but the system is 200 times faster than the Joule Supercomputer, which sits as the 82 fastest system in the world. The WSI system built by Cerebras is only 26 inches tall, only takes up 1/3rd of the space in a standard rack, and is powered by the industries only wafer-scale processing engine. This major step in WSI technology demonstrates the significant size reduction, and cost that a computer system can be. The Cerebras CS-1 also demonstrates a major reduction in power consumption; despite being 200 times faster than the Joules Supercomputer, it consumes only 20kW of power whereas the Joules consumes 450kW of power.