25-01-2021 | | By Sam Brown
Recently, Tesla has announced a recall on all Model S and Model X cars made before 2018. What problems are these cars facing, how is FLASH memory at fault, and what does this teach engineers of the future when dealing with NAND FLASH memory?
Recently, Tesla (the famous electric car company run by Elon Musk), has announced that all Model S and Model X cars manufactured before 2018 are recalled for repairs. According to Tesla, some of these vehicles show signs of failure regarding the camera and HVAC systems, making them unsafe to drive.
The reversing camera on the Tesla vehicles does not operate under some circumstances, and anti-fogging systems in the HVAC do not work. Furthermore, as the issue resides in the media control unit with access to the vehicle’s internal bus, the autopilot feature can also be corrupted and fail to operate correctly.
The recall of 150,000 vehicles is a very large undertaking for a company that has only started to show itself to be a dominant force in the electric car industry. Fortunately for Tesla, they have already found the cause of the issue, and it lies in the NAND FLASH memory used.
Inside the media control unit (MCU), a small 8GB NAND FLASH memory module exists. The purpose of this memory module is to store a variety of information, including data logging information. The FLASH memory module also supports 3000 programs/erase cycles, and this will most likely be per cell as opposed to the whole chip.
Any engineer worth their salt can already see how the system has failed…
The vast amounts of logging information stored in such a small memory chip quickly lead to the NAND FLASH becoming full. In fact, the information shows that vehicles will be able to operate for 3 to 4 years before the chip fails, and every single vehicle will eventually fail as a result of the FLASH memory.
When the memory fails, other services that rely on the media control unit begin to fail due to memory errors. The touchscreen needed for use with various controls freezes, HVAC cannot operate correctly, and some drivers have reported that chimes and other sounds also do not play.
According to Tesla, the older vehicles utilized the NVIDIA Tegra 3 processors while newer models moved to Intel-based hardware which doesn’t suffer. Furthermore, newer vehicles come with 64GB of data that far exceeds the vehicle's expected life.
The first lesson that can be learned from the NAND FLASH memory failure is how startup companies can lack experience, overcomplicate issues, and run before they can walk. In the case of Tesla, most engineers working on data logging systems would understand that NAND FLASH is mostly used for permanent storage of system files and parameters instead of constant re-writing of data. If FLASH is to be used in such a scenario, then auto-levelling of memory cells is required and large memory sizes so that data gathered overtime does not fill the chip. Furthermore, an engineer would make a basic calculation of how much memory would be needed for a data logger over the vehicle's life cycle.
Of course, this error could have cropped up as a result of too much isolation between departments. The hardware engineer responsible for the MCU may not have been involved with programming. A software engineer who does not understand the hardware decided to write a data logging routine because it was a neat feature.
This blunder also teaches how component choice is so important. Simply choosing a memory chip for its physical dimensions, memory capacity, and the cost is destined to fall ill to failure. As previously stated, FLASH memory is often used for backup, read-only information, and data transfer between computers with low write-cycles.
Another question that is raised is why the vehicle data is logging in the first place? Surly such vehicles are isolated from the internet, and such data would only benefit garages who need to repair the damage? If the logging feature was programmed in by a department because it thought it was a neat feature then parts of the vehicle were designed independently with no arbitration process. If the decision to implement such a feature came from authority then the question is why?