Christmas in July

As I wrote earlier I’m excited about the iOS13 update because it looks to be taking speech input a big step forward.

I’ve gotten a chance to test the public beta, and I’m still excited about how speech input is implemented, but I think it could be even better, and I have a couple of key requests.

First, kudos for the three well-implemented ways to do touch commands by speaking in iOS. All three are useful in different situations, and they play together nicely.

  • You can invoke number labels for every clickable element on the screen, and say number to click an element, e.g. “Show Numbers”, “45”
  • You can invoke name labels on the screen for every clickable element and say “Tap” followed by the label to click an element, e.g. “Show Names” “Tap More”
  • You can invoke a numbered grid on the screen and say “Tap” followed by a grid number to click at the number’s location, e.g. “Show Grid”, “Tap 5”

The numbers, names or grid can also persist on the screen as you do your work, e.g. “Show Numbers Continuously”. When you don’t want to see them anymore you can hide them again, e.g. “Hide Numbers”.

When the numbers remain on the screen they can fade after a short time, and you can control when and how much they fade. You can also say “Tap” and a name label whether or not the labels are showing on screen. And using the overlays you can drag from one number to another, e.g. “Drag from <#> to < #>”.

Here’s the clock app with numbers and faded numbers:

And here is the same screenshot with labels, and with grid numbers:

I tend to leave the numbers showing on my screen for stretches of time, and I use numbers and sometimes labels to touch the screen by speech. It all works well, and it’s powerful – it lets you touch anywhere on your iPhone by speech. The numbers are implemented particularly well on the passcode screen – keypad numbers are labeled with random numbers that change every time, so you can speak the number labels to type your passcode without worrying about being overheard. This would work a bit better if I could say two label numbers at once.

In addition to the touch and drag commands, there are sets of commands to navigate what looks to be all aspects of the phone. There’s  a repeat command that promises to be useful in many situations: “Repeat <count> Times”. There are sets of commands to dictate, navigate, select and edit text, including the particularly useful editing command “Change <phrase> to <phrase>”. And there’s a comprehensive set of gesture by voice commands that include swiping, rotating, panning, incrementing, decrementing, and multi-finger taps and swipes.

There’s also a nicely organized lookup of these commands, including search that returns results in categories. And users can turn any given command on or off.

We’ve come to my first key request: I think it’s important that Apple go a little further with this and give speech users a facility to adjust the default speech commands – similar to the way keyboard users can adjust keyboard shortcuts. So I’d like a custom speech input wording slot for each command.

My second request is related: I’d like to be able to instruct iOS to listen for just the custom command, because there are downsides to having synonymous commands, especially as speech input vocabulary grows.

And it’s important that users be able to save and share these adjustments. This is also true of both keyboard shortcuts and speech input. This would allow trainers to efficiently pass along a set of adjustments.

More kudos for the ability to add custom vocabulary. This addresses a key pain point of speech users – having to correct an unusual spelling over and over again. This will save lots of time and frustration.

My last request is similar to the first one: It’s also important that users be able to save and share custom vocabulary. I’m picturing a way to import custom words from a text document and/or csv.

A couple more things of note from this initial look at speech input in IOS13:

One reason speech input on the iPhone works so easily is the feedback is done very well. The iPhone shows the command it heard you say on the screen as it executes (see the first screenshot above). There’s also a Hint mode (Settings/Accessibility/Voice Control/Show Hints) that’s unobtrusive and very useful. It’s a good example of letting the computer do what it does well and the human do what the human does well. If the device hears an utterance that’s not worded quite correctly, it will give you a prompt. This is a great way to unobtrusively teach speech commands. It’s a little puzzling that despite this excellent insight and execution Apple has copied Dragon’s bad habit of having a substantial number of synonymous commands.

There’s also the ability to record a series of speech commands and name them so you can do several things at once. Having an ability like this is key to unleashing the great potential of speech input to speed everything up. Unfortunately, this is still buggy in the beta I have. I’ll circle back to test this more thoroughly when it works more smoothly. One thing that will be important is a good way to look at and organize recorded commands.

In addition to the boatload of speech input commands, Apple has improved touch input, including new gestures for moving the cursor, selecting text, copy, cut, paste and undo. And Apple has improved keyboard input, including full keyboard access with many default keyboard shortcuts, the ability to adjust keyboard shortcuts and assign keyboard shortcuts to gestures, and accessibility settings like sticky keys. These are good and welcome improvements to input on a small screen. Enabling these different types of input makes it possible for users to choose which type of input is best for any given task at any given time.

So here are my four key requests for Apple developers, in order of importance:
1. Let users adjust speech input commands
2. Let users save and share speech command adjustments
2. Let users turn off synonymous commands
3. Let users save and share custom vocabulary

That’s all for now.

Thanks for listening.