Speak to Me?

My cable TV provider keeps showing short messages telling me to talk to their system. If I change the channel, it gives me a message saying, “Next time tell your voice remote ‘CBS’.” I’m not sure why that would be easier than pushing three or four buttons to select the channel number, especially since the voice process—holding the microphone button down long enough for the TV to let me know that it’s ready and to listen to my request—seems to be a lengthier and more clumsy action. So for months I haven’t bothered to do what they tell me; they seem to be slow learners, though, they persist with the message. If I pause the video stream and the screen saver comes on, there’s often a message on it that tells me about a new programming offering and tells me something like, “To find out more about Peacock, say ‘Peacock’ into your voice remote.” Well, yes, out of curiosity I tried that once and it worked, but I would have preferred to simply find whatever programming Peacock offered by navigating the regular channel listing or the premium services screen, the way we do with Netflix.

I also own a car that would, if I were willing to activate it, allow me to use my voice to tell it to perform different tasks such as, for example, turning on the radio and tuning it to the station I wanted. I would tell you what else the car would let me do using voice commands, but I haven’t bothered to look up that information, much less try it out. Part of that, I suppose, comes from the concern that when I’m talking to someone while driving we’ll all have to avoid saying certain words that would cause the car to do something stupid, like dial my wife’s telephone or shut itself down just when the light turns green. Yes, I’m quite sure that it would never suddenly lurch across the median into oncoming traffic, although recent stories about self-driving cars with “minds” of their own should make us all wonder about such misdirections. In reality, though, I’m just not interested. I did do a brief scan of the owner’s manual—yes, this particular new car surprised me with a thick printed manual, which at one time was a common item—and this cursory examination revealed that there are many electronic features of this car that I will never use. In that sense it is somewhat like my smartphone, only with somewhat fewer unnecessary functions and a much smaller list of apps that I can add on later. I assume that it won’t be long before our cars will come with lists of as many apps as our phones come with now, and one of the first actions I will have to do after buying a new car will be to spend two hours uninstalling or hiding the pre-installed apps that I know I will never use.

Now that you have read the above two paragraphs I expect that you wouldn’t be surprised if I tell you that I have a general rule against talking to inanimate objects. Well, I do. And no, it’s not part of a recognized or systemic philosophy; for example, I am not a resistentialist, much less a Luddite. I don’t subscribe to the well-known hypothesis of the innate perversity of inanimate objects. I don’t have a generalized prejudice against modern inventions or more specifically, against devices that pretend to be able to respond intelligently when spoken to. I just don’t like to talk to objects. Okay, on occasion I have made an exception of talking to large bulky pieces of furniture, but only when I’m forced to try to move them. As my friends can tell you, I’m also not fond of talking to people through an intermediary object, that is, using a telephone.

Some of my current reluctance to talk to devices that are termed interactive or responsive may come from years of experience. I’ve tried speech recognition and speech-to-text software, several times. About ten years ago a neighbor of ours received a popular speech transcription package from his daughter as part of a proposal that he could dictate his memoirs to his computer and perhaps eventually publish the story of what had been, in fact, a very long and eventful life. This neighbor asked me to help him learn how to use the software. I had made other such attempts in previous decades out of curiosity, to check out the hype about flawless hands-free writing, and knew that the quality had always been limited, but improving. I was still hesitant, but still curious, and I agreed to try. We installed the software and went through the training process, with my neighbor speaking the phrases requested by the computer and letting it become accustomed to his phrasing and accent, which fortunately was a fairly standard California white dialect. Then we tried it out. The result was, to use a familiar comparison, somewhat like someone who’s a bad speller relying on autotext. The mis-transcription of place names like Sacramento or Stockton was to be expected, I suppose, but ordinary English words were also often twisted, creating sentences that did not make sense and documents that required extensive editing before they could be saved or transmitted to anyone. True, it didn’t help that the prospective memoirist would laps into irrelevant dialog and forget to turn off the mike, adding lines of irrelevant content. But throughout this process I gained a renewed respect for the complexity of the old retro business tradition of dictation and shorthand, not to mention the current retro system of court reporter stenography, and for the people (mostly women) responsible for completing those tasks.

Computer-mediated transcription has improved somewhat more in the past decade, but even simplified and brief interactions remain suspect. Alexa and Siri have provided relatively accurate responses to people but also fuel for viral comic videos. I have friends who use Siri regularly to ask for information, and it usually works. But experience has shown that these interactive squawk boxes do best when listening to a standard adult midwestern white dialect; individuals who have a nonstandard accent, including Black, Asian, East Indian, or southern, have reported errors as often as 50 percent of the time. Such minority-related errors shouldn’t be surprising, given that facial recognition software is also similarly and notoriously error-prone when dealing with images of minority individuals. The problem is that both human speech and human physical features are incredibly varied and complex, and computers are not yet up to the tasks they are being asked to perform, especially when programmed by their almost entirely white standard-dialect-speaking designers. I am tempted to remark that computers are not up to the kinds of actions that ordinary people achieve every day, but then I remember the common phrase “all Black people look alike” and the George Bernard Shaw quote that “the United States and Great Britain are two countries separated by a common language.” It seems that we humans are asking our technology to do things that we ourselves are frequently not able to accomplish.

Would we do that? Oh, yes, we would, and I would, too. Only I personally won’t be doing it using my voice. I didn’t spend a year in typing class in high school to throw it all away now. Wait, was that just another misleading excuse? Am I simply, throughout all of the above, trying to find justification for a prejudice I have always had against using verbal expression to communicate with anything other than a responsive human being? Very likely. I admit that I don’t think much of talking to animals, either. That I may justify by noting that whenever most people converse with dogs they use a form of happy talk that they otherwise only use with babies, so they don’t really talk to dogs either, and hardly anyone talks to cats. Clearly, I’m not a fan of statements such as, “Who’s a good boy, who? You are!” performed with exaggerated emphasis in an abnormally raised tone. This form of address seems to be a recognition that the dog, or a baby, is really only capable of understanding the tone of a statement, not the content. And that recognition is the truth, and that too may be a large part of the reason I don’t like to talk to animals or babies or objects.

Am I rationalizing again? Probably. The truth is that I really don’t know why I don’t like to talk to objects. What I do know is that I prefer to talk to responsive persons whom I can see, ones who can display facial expressions and other forms of body language that indicate a reaction to what I am saying and that provide me with clues as to whether I am providing some useful or comprehensible information and whether or not the recipient is accepting or rejecting the message. I believe that I want something immediately meaningful in return to my comments, and that is true whether or not the person in question actually responds verbally. I believe that such responsiveness is what I really need from a conversation, and that may be, truly, the heart of the matter.

