22-03-2016 | | By Paul Whytock
Specialists in voice and music connectivity, XMOS, has developed and launched the xCORE Array Microphone and claims it is the most flexible microphone aggregation solution for voice user interfaces (VUI). VUI enables man-machine interaction via human speech and emerged way back in the 1870s but it was not until 1952 that Bell Labs developed the first effective speech recognition system. In the 1990s IVR (Interactive Voice Response), an automated telephony system, became standard equipment for call centres and by the 2000s voice recognition had become more widely used. The arrival of Apple’s Siri in 2010 further advanced the application of voice recognition relative to smart consumer technology . VUI has therefore made enormous progress from those early innovations but ? human language does not have a very strict structure, especially iwhen spoken, and is affected by local habits, accents, and other factors which can confuse recognition systems. One amusing example of this is when a journalistic pal of mine decided he would try and "teach" his new computer some voice activation words by speaking to it after having spent a long evening in the local pub with friends. Needless to say the next morning when he tried to voice activate his computer it didn’t recognise him.Key Element
But getting back to the xCore-200 and the advantages it offers VUI, a key element is its ability to aggregate up to 32 MEMS microphones, and also provide USB and I2S backhaul. Applications include smart TVs, sound bars, virtual digital assistants and smart home automation. Based on the xCORE-200 device family, the solution is delivered as an evaluation board and supporting software libraries. But why is the aggregation of microphones so important? The reason, according to XMOS, is that accurate, far-field voice capture underpins VUI technology and is best achieved using multiple microphones. Many of today’s solutions are limited to a smaller number of microphones but because the XMOS solution can be deployed with up to 32, it provides higher signal-to-noise performance and enhanced control of both sensitivity and directivity.