11-10-2016 | | By Ross Bannatyne
Ross Bannatyne explains common ways semiconductor devices are made to resist extreme temperatures and radiation, and introduces Vorago Technologies’ proprietary hi-rel semiconductor process.
Microcontrollers have become a mainstay in almost every electronic product on the market. There are now literally thousands of choices of MCUs, with almost every combination of integrated peripheral, on-chip memory options and package type that you can dream up. For operation in environments of extreme temperature or radiation however, the choices available to designers are significantly more limited.
‘Extreme’ environments for electronics include terrestrial electronics and systems that go into space. Temperature extremes and high levels of radiation can be found in both environments. Examples are downhole drilling, jet engine controls, satellites and even medical equipment (think of machines that use radiation for imaging).
Engineers that have to design systems that operate in these tough environments don’t have an easy time of it. If a processor is required to operate at 200°C or can withstand a Total Ionizing Dose (TID) of 300 K rad (Si) of radiation is needed, there is an extremely limited pool of options. The starting point is usually to check which components can meet the extreme specification then choose the closest fit – this very well might be an expensive FPGA that is overkill for the job but meets the temperature or radiation specification requirements.
Because so few vendors develop products for extreme environments, there is an industry around up-screening commercial off-the-shelf (COTS) products. Up-screening is the task of finding process outliers that appear to operate out with the specification that they were designed to meet. Often up-screened products are also re-assembled in more robust packages that can handle the extreme environment more readily than the original package. Up-screened products have played a valuable and meaningful role in creating many of the extreme environment systems that are in use today, but there is also a downside. Using up-screened parts is not ideal as they are not recommended or guaranteed by the original manufacturer, they are not designed for purpose, the manufacturing process that created the outlier is not repeatable and 'walking wounded' can be created during the up-screening process. It is also expensive to obtain up-screened devices as it takes a lot of effort to screen out the wheat from the chaff.
The biggest problem that CMOS semiconductors face in extreme environments is ‘latch-up’. Under conditions of high temperature and radiation, the silicon is exposed to conditions where parasitic transistors can be switched on by high temperature carrier creation or by an ionizing radiation strike. All bulk CMOS wafers contain millions of parasitic structures (that look and behave like thyristors) that are spread across the wafer. This byproduct of the CMOS wafer architecture is usually not a problem if the device is operated within a limited specification, but at high temperature or when radiation is present, latch-up occurs when the parasitic structure is triggered. Figure 1 shows a textbook type cross-section of a CMOS structure with the bi-polar parasitic structure superimposed on the diagram.
Latch-up occurs when the parasitic bipolar transistors become forward biased and switch on. The transistors will drive each other into saturation and create a short circuit from Vdd to Vss. When latch-up occurs, a high current will flow through the short circuit. This can result is permanent damage. To get out of a latch-up condition, the device must be reset.
Immunising against latch-up is not always easy. It can be addressed at the system level, for example adding cooling systems such as forced air or liquid cooling to protect against high temperature latch-up. For concerns about radiation induced latch-up, screening such as lead shields can be added. Both of these techniques unfortunately generate a lot of overhead in size, weight and cost, and can increase power consumption (which creates more heat). Another way of addressing latch-up is by using a specialised semiconductor manufacturing process such as Silicon-on-Insulator (SOI). This approach is effective but expensive as it is not compatible with all the industry standard CMOS infrastructure that exists. Another approach is to modify standard CMOS by adding a ‘Buried Guard Ring’ to the existing CMOS substrate. This approach is shown in Figure 2.
The HARDSIL process is a simple tweak to standard CMOS designs that includes a vertical and horizontal implant that immunises against latch-up by creating a highly conductive layer underneath the CMOS devices and wells combined with a high conductivity connection to well contacts. This approach enables high temperature and radiation tolerant operation by reducing the parasitic resistance so that the parasitic NPN cannot turn on and reducing the gain of the parasitic transistors so that the bipolars cannot sustain latch-up. HARDSIL has been implemented on space grade semiconductors and has proved to be effective for latch-up immunisation.
Additional hardware features can be added to a microcontroller to further protect against ionising radiation strikes to the silicon die. Figure 3 shows a block diagram of a radiation hardened ARM Cortex-M0 based microcontroller that has HARDSIL treatment along with an Error Detection and Correction (EDAC) subsystem and Triple Modular Redundancy (TMR) implemented on its registers.
Particle strikes on memory cells will cause bits to ‘flip’ and the normal expected operation of the microcontroller will be disrupted. For this reason, EDACs are often used to ensure that memory contents can be considered reliable. The EDAC shown operates on a byte level. Every time the CPU fetches a word from memory, the subsystem can detect two and correct one error per byte. It is extremely unlikely that there will be more than one flipped bit per byte as adjacent logical bits are also spaced physically wide apart (to prevent this). This also reduces the risk of a low angle-of-incidence ionising strike disrupting adjacent cells. The memory protection system also includes a ‘Scrub Engine’. This operates autonomously in the background of regular CPU operation and steps through memory sequentially to correct any errors and prevent an accumulation of uncorrectable errors. This mechanism is there to ensure that the probability of accumulating two flipped bits in a single byte is statistically negligible.
Most engineers are more concerned about disruption to logic circuits rather than memory cells because EDACs have been proven to work well to prevent flipped bits. While HARDSIL has been proven to work well in preventing latch-up, there is a danger that the same radiation strike that can trigger it could also upset logic signals. If an erroneous logic level is latched, it can be propagated through the circuits. To prevent this problem, Triple modular redundancy can be implemented by replacing standard latches with ‘DICE’ (Dual Interlocked Cell) latches that have three voting cells.
One of the most interesting growth applications where microcontrollers need to tough it out in extreme environments are small satellites. Constellations of ‘small sats’ are being deployed in orbit in increasing numbers for telecommunications, to collect images, use sensors to measure other types of data (like heat spots on earth) and for science exploration. As the mission lifetime expectations increase and the orbit of these spacecraft are extended, electronic components that are used on small sats are being exposed to increasingly larger amounts of radiation. Because the cost of a small sat is typically orders of magnitude less than a ‘traditional’ large satellite, it is driving a demand for more cost-effective radiation-hardened components. This in turn is driving innovation such as HARDSIL to enable small sats with rock-solid components that will work reliably in extreme conditions. Figure 4 shows a small satellite of ‘CubeSat’ form factor, 10 x 10 x 11.35 cm cubic units.
In conclusion, there is a definite paradigm shift going on. Not so many years ago, it was accepted that microcontrollers and other programmable devices were limited in their ability to operate in extreme environments. Designers are relentlessly driving cost out of systems and constantly increasing the performance of circuits even under the most extreme environments. It is now commonly recognised that by enhancing the specifications of components, significant system benefits can be achieved by reducing size, weight and price (SWAP). This is driving a new generation of components that can operate reliably under even the most extreme conditions.