One of the things that’s always fascinated me about the HomePod is its ability to hear you in all sorts of situations – from across the room, while loud music is playing, while people are talking, or from another room in the house entirely. This makes it easy to communicate with Siri in all types of real life situations[^At this point, I feel it’s important to remember that the key to any voice assistant is to speak slowly and clearly.].
Recently, Apple posted a blog post on their Machine Learning Journal that outlines how they optimize Siri on HomePod in far-field settings – and wow, is there some fascinating stuff in there! Here is a list of some the technologies that go into the HomePod as it relates to sound input processing:
- Deep Neural Network (DNN)
- Multichannel Echo Cancellation (MCEC)
- Residual Echo Suppressor (RES)
- Speech Presence Probability (SPP)
- Multichannel Wiener Filter (MCWF)
- Blind Source Separation (BSS)
A lot of this stuff is well outside of my wheel house, but the most interesting part to me is how much work and effort Apple puts in to make the best smart speaker in the world. They’re not content just making another speaker with a microphone on it. In typical Apple fashion, they will only enter a new product category when they can truly make a difference and create something leaps and bounds better than what is currently available.
Also available in audio format