Voice Interfaces: The Missing User Interaction Element

Apple Siri, Google Now, Amazon Echo, and Microsoft Cortana have garnered a lot of press lately. But one thing which is still missing out is voice-native user experience.

Let me illustrate that with the evolution of user experience on touchscreens. When they first came out, there was a stylus, and that’s it. It was an inferior version of the mouse-keyboard-monitor trio. Then some fantastic interactions were invented. Interactions like double tap to zoom, multi-finger rotation, swipe to like/dislike, pull down to refresh, long-press for options, and a Swype keyboard. All of these were native to a touchscreen-based environment. Porting them back to a mouse-keyboard-monitor trio was of limited utility at based and useless at worst.

Entering emoji on a smartphone is not very different from typing an English letter via an on-screen keyboard. On a desktop, it is. No wonder emojis took off with smartphones. And let’s not forget the physics-based games like Angry Birds.

The current voice-based interfaces are crude; they are almost equivalent to a computer transcribing your query into a Google search and then performing the search. Try doing anything exploratory like flight searches, restaurant searches, etc. and the interface falls apart. But that’s analogous to editing spreadsheets on touch-screens, it’s still an inferior experience compared to doing that on a traditional PC. What we should be looking forward to is voice-native user experience. Interactions which can be performed way better using voice than a touch-screen or a PC. It would be even better if such voice-native interactions can’t be backported to non-voice interfaces.

Related Posts