Kirk Klasson

The Dawn of Agency

In case you missed it, Google’s recent Duplex demonstration left the attending techno glitterati gobsmacked and agape. Seems they were astonished by the authenticity a few “oh’s” and “um’s” and “ah’s” could produce when added to computer generated speech. I guess most of the audience wasn’t around back in 1990’s when natural language processing systems (NLP) and spoken dialogue systems (SDS) first proposed the use of these idioms in successful human-machine conversations.

In fact, the conversations Google Duplex produced seemed so life like that it raised a host of suspicions, not the least of which was that if a computer based agent was actually talking to and making a reservation at a real restaurant you’d be able to hear the ambient clattering of plates which wasn’t evident in the demonstration. Therefore some treachery must have been afoot.

Clever glitterati saw right through that one.

Dishes notwithstanding, the real accomplishment of the Duplex demo was not the linguistic fluency but the agency with which the machines accomplished their tasks. For these purposes agency could be considered the capacity of an autonomous actor to accomplish a specific objective such as consummate a transaction within the confines of a knowable environment and given set of circumstances.  This is markedly different from existing digital assistants whose primary focus is to find and convey information to a user who then employs that information for their own purposes.

Successful agent interaction might seem rudimentary and straightforward but even the simplest of transactions in the most narrowly defined environments can be fraught with the most daunting nuances.

Nearly every transaction, no matter how simple, involves options and obstacles and nearly every option and obstacle involves negotiation and every negotiation creates more options and obstacles.

Let’s say you want a restaurant reservation at 6PM for 7 people. However the restaurant you’ve selected can only accommodate a party of 7 at either 5 or 9PM. Your agent does what? Finds another restaurant? Calls all members of your party to see if they can change their plans? Hangs up? Complexities that humans encounter and resolve on a routine basis could bring most machine based agents to the brink of apoplexy.

Yanny v Laurel, you decide…

For years one of the most vexing issues in NLP, and therefore speech based agents, was the disambiguation of the spoken words. Similar sounding words often have different meanings such as feet or feat, a measurement, accomplishment and appendage. The most facile means of establishing usage is through recognition of context. Task completion also requires a similar recognition of context given that there are numerous unexpected circumstances that can influence their successful execution.

Speech disambiguation is relatively straightforward. Examine the conceptual tupeles surrounding and adjacent to the word that requires resolution and you should readily find your answer. Task based disambiguation is slightly more challenging as the resolution is more ontological than linguistic, something that the folks at Siri figured out early on.

Siri’s original architecture employed a number of backend, bespoke ontological engines (see Prophets on the SILK Road – October 2012) dedicated to specific domains that could assist in resolving conceptual ambiguities. The only thing is that these engines were very complex, completely irreducible, hand-coded thingies.

In Duplex, Google has taken a more generalizable machine learning approach to the acquisition, creation and utilization of ontological concepts. According to Google’s AI Blog the creation of ontological context was accomplished using supervised, real-time training of a recurrent, feed-forward neural network. This is certainly a more generalizable and extensible technique (see AI: What’s Reality but a Collective Hunch? – November 2017) but not necessarily one less expensive or more accurate than Siri’s initial approach.

Source: Google’s AI Blog

Further, outside of a few well-bounded instances, stochastic convergence for purposes of conceptual resolution are just as likely to face plant as complex hard-wired, pre-programmed ontologies. So opting for one technique over another might be a case where only time can tell. At this moment, having basically squandered any advantage Siri initially provided, everyone is waiting to see, Bixby’s Viv based “dynamically evolving cognitive architecture” notwithstanding, what the next iteration of Apple’s technology might look like. And here’s hoping that Shortcuts, Siri’s recently introduced IFTTT appliance, isn’t that much awaited iteration.

My Proxy Will See You Now

Thanks to prevailing socio-economic predilections, the Internet has become one of the least authentic phenomena of modern life. Wanna goose your twitter ratings? No problem. There’s a bot for that. Wanna goose your search term ranking? No problem. There’s a bot for that, too. Just last week Facebook announced that it had disabled nearly 1.3 billion accounts due to the fact that they were bots or spammers. Sadly, some of them were among my closest friends. And if you’re beginning to feel like the Internet has evolved into a Chimera of undetectable, unadulterated crap, a buoyant pool of plastic trash floating in a vast Pacific ocean, you’re not alone.

When the only thing authentic on the web is the people attempting to make use of it, maybe it’s time to take a different approach.

Anonymizers are almost as old as the Internet. Some of the earliest and savviest users immediately recognized the danger of exposing your true identity when consummating transactions over the web. (see Anonymity Ain’t What it Used to Be – August 2017) So over the course of time we have seen numerous forms of proxies evolve that attempt to mask the human initiators of web-based intercourse, so to speak, Second Life users excepted.

Way back when there were private/public VPNs (see The Apple of Sauron’s Eye – May 2012) and then along came ad blockers (see Who’s Zoomin’ Who? – October 2015) and with each successive attempt to conceal human identity there have been equally rigorous efforts to uncover the person behind the proxy.

With the advent of personal digital agency we may have reached a moment where technology can make human authenticity commensurate with that of the denizens that inhabit the pool of plastic crap that is today’s web.

On average, nearly 90% of the phone calls I receive are unsolicited, bot generated spam. The same is true for e-mails. Considering these circumstances, what might be useful would be a personal agent, not assistant, but agent that could autonomously interrogate these inquiries and solicitations and appropriately negotiate or dispense with them based on pre-determined preferences.

I imagine the conversations might go a bit like this:

“Hello, I’m with Acme Singularities and I’m here to offer you a once in a lifetime opportunity.”

“On what basis can you make that claim?”

“We can successfully re-host your consciousness to a computer-based environment, sustaining your essence in perpetuity.”

“So are there any living clients that can attest to your success?”

“Well, actually, that’s not the way it works.”

“So is that what you mean by “once in a lifetime”?”


Graphic courtesy of Apple’s Siri logo all other images, statistics, illustrations and citations, etc. derived and included under fair use/royalty free provisions.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Insights on Technology and Strategy