Apple’s Getting Serious About Speech Input

Wow. Speech input is about to change in a big way.

It looks like Apple is finally addressing the big gap in speech input on mobile devices. For many years iPhone users have been able to ask Siri to do a growing list of useful things like make a call, take a note, search the web, or launch an app. And the speech button on the keyboard has allowed for text input.

But there was a big gap in the middle. You couldn’t edit the text you’d dictated using speech. And you couldn’t control an app once you’d launched it.

Apple has finally addressed this in iOS13. And it looks like they’ve paid attention to some key things.

Take a look at the iOS13 preview documentation, under Accessibility near the bottom of the list.

Speech editing commands are on the list of new features for iOS13, including the text replacement command “X Replace With Y” that has been available on the Mac for several years now. There are many other commands, including ways to move the cursor and select text by character, word, line and sentence.

And it looks like they’ve included on/off control for every command, and also allow users to customize command wordings.

They’ve also addressed tapping on the screen using speech, and done so on a comprehensive way.

The preview details three ways to click using speech input:
1. You can invoke a grid, then say a number to tap one of the squares on the grid.
2. You can invoke numbers for all the clickable elements on the screen, then say a number. Numbers also automatically appear in menus.
3. You can also tap elements by saying “Tap” and their names. And you can invoke an overlay of element names on the screen.

It also looks like Apple has addressed gesturing by speech. The preview gives several examples: swipe, pinch, zoom, and press the home button.

They’ve addressed custom commands. You can record multistep gestures for apps and invoke them by speech.

And they’ve addressed a seemingly subtle issue that looms large when you’re using speech input to do work with other people around. Speech commands are only active when you’re looking at the phone. So you can look away and have a conversation with another person, then look back and use speech commands without having to think about what you might erroneously be doing on your phone while you’re talking to someone else.

The bottom line is it looks like Apple has made phones hands-free. So if you’re not able to touch your phone – if you have trouble using your hands, or your hands are busy – you’re able to do anything on your phone using speech instead.

I think this is the right way to enable any given type of input – allow people to do anything using a given input method so the user can choose what mix to use. A scientist might choose to talk to her phone while her hands are busy in the lab, but opt for silent input on the train, for instance.

The other thing about making speech input comprehensive is in some situations speech input is faster than the keyboard. This gives people the ability to tap that.

It also looks like Apple is going to keep speech input in step on mobile and desktop.

Thanks, Apple, for the next move. It’s looking pretty good so far. A little over a year ago I wrote a blog post about five things that were badly needed in speech input. From the preview it looks like Apple is addressing #1 and #3.

I’ll write more about this when I actually get my hands on it. I’ll be looking for several key things:
– How speedy it is. I’ve written a lot about the frustrations of even a slight slowdown with desktop speech.
– How easy it is to customize command wordings.
– How easy it is to save and share customized command wordings.
– How easy it is to record and otherwise construct custom commands.
– How easy it is to organize, save, and share custom commands.

Do these right, and you unleash the power of users to improve speech input.

If all goes well, we’ll be talking to our phones more.