30-03-2023 | By Robin Mitchell
Researchers recently demonstrated how hackers can exploit Near-Ultrasound Inaudible Trojan (NUIT) to send inaudible ultrasound commands to voice-controlled systems like Alexa and Siri, raising concerns over privacy, surveillance, and data breaches involving personal information.
This hardware design issue highlights the need for engineers to develop solutions to defend against such attacks and improve devices' frequency response to filter out inaudible commands, potentially necessitating a software fix to address ambient noise interference. What challenges do voice-activated systems present, what did the researchers demonstrate, and how can such attacks be mitigated?
Over the past decade, voice-activated systems have significantly improved thanks to the introduction of AI and machine learning tools. In the past, speech-to-text would require extensive amounts of training and operates by identifying specific patterns, but while this did work in some cases, it was sketchy at best. This all changed when AI made its debut, as algorithms would instead be trained against millions of voices around the world instead of a particular individual. The result is that modern speech-to-text is not only able to turn spoken words into text accurately but to do so in many different languages and accents.
The ability of voice-activated systems to make any device hands-free has seen them quickly find their way into everyday devices, including smartphones, smartwatches, and hubs. One particular use case of voice control is in smartwatches, where the small screen size can be far too difficult to type out messages. Another extremely useful application for voice control is in scenarios where some activity, such as construction or cooking, cannot easily be left, but devices nearby need to be controlled.
However, for all of the advantages that voice control brings, there are numerous challenges that they face. Arguably, voice control’s biggest challenge is the significant risk placed on user privacy. In order for voice-controlled systems to be accurate, they need to be trained, and the best way to do this is to record human interactions with the device, determine if the command was understood correctly, and then use that data in an AI training model.
However, this quickly leads to millions of conversations being stored, some of which may contain private information such as bank details, security answers, and personal information. While it is unlikely that a large cooperation will abuse this data, any data breaches could see these conversations leaked to the general public or purchased by a third party who could use that data maliciously.
Another challenge faced by voice-controlled systems is that the ability of a device to listen in to conversations at all times could allow them to become surveillance platforms. If poor security practices are used, hackers and other malicious actors could hijack a device to actively listen to conversations and report back whatever has been recorded.
While rare, there are times when software assistants, such as Siri and Alexa, will randomly respond to a command that was never said. In fact, this has personally happened to me on several occasions, where my Apple Watch will start listening to my conversations as it believes it has been activated. This could be due to the ambient noise level or that it thinks that a word said by someone nearby sounded like “Hey Siri:”. Either way, it is perfectly possible for devices to accidentally hear commands and process data as a result.
However, a new attack against such devices has been discovered by a team of researchers from the University of Texas at San Antonio and the University of Colorado at Colorado Springs, called Near-Ultrasound Inaudible Trojan (NUIT). Simply put, researchers have discovered that many microphones found in modern devices are sensitive enough to high-frequency tones that humans cannot hear, and these tones can be modulated with speech to fool such devices into executing commands.
The researchers posted a series of videos demonstrating how the trojan works, including turning down the volume of a smartphone to 1 and unlocking a door. From there, the researchers then demonstrated how a simple webpage can be used to play these commands in a browser and affect devices in the room.
Unfortunately, there is no easy way of getting around this attack, as the attack has nothing to do with software vulnerabilities. Instead, the attack results from hardware design, specifically the microphone, so the only reliable defence against such an attack is to change the frequency response of microphones (by rejecting ultrasonic frequencies).
However, it is also possible that voice command systems could look at the commands being asked and determine if a potential attack is in progress. For example, if a low-volume command is followed by an open-door command, this could either result in the volume command being ignored or the door-open command being rejected. At the same time, notifications could also be pushed to user devices to let them know that inaudible commands may be present in their surrounding environment. A software fix that looks at the frequency of incoming sound could also be used, but this may be difficult to implement.
This attack poses a genuine threat, as digital assistants such as Siri and Alexa control many home security systems. As such, engineers should actively investigate such attacks and develop hardware solutions to prevent them in future devices.