AI: “What’s reality but a collective hunch?”
In case you’ve missed it, MIT Technology Review has been doing a spectacular and somewhat sobering series on the current state of Artificial Intelligence, which I believe they plan to culminate in a dedicated year-end issue.
And not a moment too soon.
You know things have gotten out of hand when to say that the enthusiasm over AI has become a little frothy would be the leading tech understatement of the year. Some might argue that the frothy award belongs to blockchain or IoT but then those in the know would quickly point out that AI over blockchain is where the real action begins. Back in the summer Gartner put AI at the very pinnacle of the hype cycle, poised to pull a nauseating number of G’s once it hits the trough that it’s currently staring into. But a couple of years and a couple of tweaks and AI will eventually find itself on the sweet plain of Elysium where we all know it belongs.
Backprop this!
Recently, James Somers in an article entitled “Is AI Riding a One-Trick Pony” for MIT Technology Review does an excellent expose of exactly how current AI technology works as well as providing a hint as to where it might be headed. “Current” might be too loose a modifier as the “current” technology fueling the cult of deep learning was proposed some 30 years ago and achieved its first practical demonstration back in 1986.
Somers uses the unassuming instance of hot dog image/object recognition to explicate the magic that has enthralled all of techdom. You start with a basic neural network, a hierarchical collection of tiered, ordered nodes that provide for a linear assessment of an image by promoting a value of excitement from each node in the hierarchy to the next, based on whether the node in question believes the image so analyzed contains or doesn’t contain a hot dog. The value of the output produced or received by any given node is a numeric value called a vector, which it assigns to represent the excitement or “intuition” that the fragment of the image it has examined meets the presumed criteria of the object it has been asked to identify. Vectors are a little bit like hashing values, a kind of short hand that other nodes can interrogate to bolster or diminish the level of excitement trans-versing the network.
The way you get started is by providing this neural network millions of pictures of hot dogs and for each image so provided explicitly inform the network that it either is or isn’t a picture of the object in question. Eventually the network will “acquire” an “intuition” as to what attributes constitute the object in question. But this doesn’t always work so when the results are inadequately assertory, specific tiers or nodes are tuned using a technique called backwards propagation or backprop, a kind of tuatological “feedback” which “steers” the network to the correct conclusion. Essentially, as more and more images are fed to the network, nodes acquire a kind of specialization that affords them the opportunity to have their vectors tuned either algorithmically or by the powers that be, which in this case is usually some AI geek who has a vested interest in hot dogs.
Go figure.
So what if we were to put this same hot dog challenge in front of a neural network of our own making?
Now lets assume that our neural network consists of a tiered construct of Facebook users modeled on a generalized demographic of Internet traffic. Because of their reach, we’ll put the bots closest to the input so their vectors can enjoy the greatest degree of propagation. Next, somewhere around the middle of our stack we will designate a cohort of soccer moms from northern Virginia whose sole function is to discriminate whether the object in question can rotate on an axis corresponding to its apparent length. Finally, we’ll arrange a cohort of real and make believe males ages 18-24, Facebook’s largest single demographic, in a phalanx of nodes nearest the output to reflect their genuine enthusiasm for all things Facebook related. This should give us a hugely populated, globally diverse neural network with an IQ about the temperature of two-day old bath water. I have it on good authority that this very instance of AI is what they are putting together over in Building 8.
Now this is where Backwards Propagation and Deep Learning play a critical role in image/object recognition.
If we were to peer into the northern Virginia nodes we’d find that they face a unique challenge when determining what vectors to promote. You see lots of ordinary objects can rotate around an axis that corresponds to their length but very few can sit on a Roller Master 2000 Grill for six weeks until you come along and stuff them in your mouth before getting back on the highway. The universe of ordinary objects that can happen to be rolled on an axis corresponding to their length is extremely large. Rugs can be rolled, joints can be rolled, cans and candles and dowels and dildos all can be rolled. Just in the universe of food alone, not that this particular neural net would know that’s what we are looking for or at, the list is daunting. Egg rolls, tortillas, jerky, string cheese, yodels, pickles all can be rolled.
Now given the way our network has been constructed, what is the probability that any given image would be recognized as a dope smoking, cheese infused dildo? Probably a whole lot greater than you think. And that is where backprop comes in. The vectors promoted by northern Virginia would be tuned until, when promoted to the final group, they could successfully determine that the object in question either was or was not a hot dog.
If this example seems a tad far-fetched, in reality it is not. What AI has discovered is that there are multiple, recursive forms of bias infused in its very nature. From the construct of the neural net to the tuning of promoted vectors, the outcome is in no small part predetermined. Which may or may not be something of concern. This in part stems from technical issues that will eventually be solved. For instance, how can numerous abstract concepts such as axis, rotation, flavor, context, not directly available from the image input, be brought to bear in the discrimination and resolution of the identity of any particular object? Conversely, how are aspects plainly visible in an image filtered out so as not to skew its intended recognition? Consider this: while hot dogs cannot fly; they can be thrown. They cannot swim but they can float. Given a limited number of pixels, and misleading contextual visualizations, how would you know how to eliminate the superfluous to recognize what you are really looking at?
Just over the horizon…
What seems to get lost, mostly by the vendor, press and analyst communities, in the crush of AI enthusiasm is a pragmatic view of its primary value proposition. Spoofing Siri, while amusing, isn’t a vastly monetizable phenomenon. Today the conversations surrounding the commercialization of AI is a little bit like those surrounding harnessing nuclear fusion; table-top demonstrations show promise but practical implementation is always another 10-20 years away. A circumstance that apparently isn’t lost on a majority of today’s senior managers (see The Elephant in the Room Just Got a Little Bigger – May 2017)
As cited by Forbes, Forrester Research expects a 300% increase in AI investment YoY in 2017 and projects that the commercialization of AI would result in $1.2 trillion in revenues shifting between incumbent and emerging “insight based” rivals by 2020. Some of these emerging rivals would likely include today’s AI pioneers including IBM, Google, Amazon and Facebook whose current trajectory is more one of customer domain augmentation than flat out competition. But that is likely to change as information intensity pulls these players forward into their customers markets. The only question if how fast and with what second order effects will this occur.
Source: Forrester Research as cited by Forbes
The idea that, in the space of three short years, AI will transfer $1.2 trillion in revenue from incumbents to emerging “insight based” rivals seems more than a bit far fetched. For instance, Watson, IBM’s cognitive computing platform, commercially introduced in 2011 isn’t expected to generate $10B in revenue until 2024. This is according to the company’s own forecasts. And researcher Tractica forecast that the entire deep learning software market would barely amount to $10B by 2024. IDC has a slightly different take suggesting that all up and all in world-wide revenue from all AI segments, software, services, infrastructure, etc will reach $12.5B in 2017 and $46B by 2020. If it sustains that growth rate in might reach $232.0B by 2024. This is a far cry from what IBM expects as well as projections from other analysts. Per CBInsighs, venture funding in this space since 2012 has amounted to approximately $12.4 billion as of the end of 2016. However, McKinsey just weighed in suggesting that venture investment in AI was somewhere between $26-39 billion alone in 2016. Depending upon whose forecast you believe and whether or not you consider the incumbents to be uniquely advantaged in capturing those revenues, this level of investment is either a pittance or pernicious. If, on average, 80% of venture money goes no-where (see Greed is Good – August 2017) and you accept the CBinsights and the McKinsey numbers as reasonable then there’s $40B in dead money already spent. On the other hand, a three year $1.2T return on $50B up front might get past most boards of directors. Assuming you had $50B lying around waiting to be spent. Either way something doesn’t add up.
And stepping back, there are other obvious constraints that aren’t often discussed. Fortune estimates that in 2017 there will be 10,000 US AI jobs valued at $650M in annual salaries alone. Over 60% of these jobs will require advanced degrees and most of those will have to come from one of the top 20 leading universities in this field. Assuming the rest of the world will demand a similar amount of talent, each of these universities will need to issue 1,000 AI undergraduate or graduate degrees each year for the foreseeable future. The only problem is there aren’t that many qualified teachers to meet this demand and rock stars who earn these degrees will be lining up at Amazon and not trying to get teaching gigs at Carnegie Mellon. It seems that Google has already recognized this constraint and is feverishly working to relieve it with a project called AutoML where AI becomes the source for developing future instances of AI. So the meta-bias that already permeates neural nets and backwards propagation is about to become an order of magnitude more complex and more opaque.
A different and more pragmatic view of AI’s future is beginning to emerge, one where practical constraints such technical fallibility, availability of talent and first mover advantage, temper unbridled exuberance. To help explicate this emerging picture we will employ the storied McKinsey strategic three horizon framework; storied because the efficacy of any advise this blog renders under this particular rubric vanishes long before the first horizon comes, goes and disappears into the twilight of “what the hell was I thinking”. So, just keep that in mind.
We are currently trans-versing the first horizon. Feats of computation prowess such as recognizing hot dogs and ordering up cat food by talking to Alexa will begin to become more and more commonplace. Soon you’ll be talking to your house, your car, your television, etc and they will be talking to you. So much so that the mere act of addressing objects in this manner will have no intrinsic or commercial value. Similarly, as McAfee and Brynjolfsson suggest in “Machine, Platform, Crowd”, value will be extracted from applications that sit in the dull, dirty, dangerous and dear end of the spectrum. As a rule, once proven, technology of this ilk becomes an open-sourced primitive or utility; a meta-component at the bottom of an emerging AI stack. The second order consequence of which is that it becomes costless to both vendors and users, forcing vendors to seek value elsewhere. So instead of augmenting familiar existing value propositions, such as providing vocal animation to inanimate objects (see “Over here, stupid!” – May 2011), AI players will have to re-invent them, moving into their customers markets and potentially prompting an anti-trust backlash.
The next horizon will begin to pit incumbent domain specialists against each other where the antagonists are likely to be established industry players and an emerging group of AI based usurpers (see Platform Strategies in the Age of AI – August 2016). This is where established players seek to extend and preserve established value propositions and the AI players seek to end their relevance completely. Think of it as the difference between treating cancer and curing cancer. Your job isn’t simply assumed by a robot; it is eliminated altogether. Second order consequences could include dystopian disruption of our social and economic fabric where avatars are more trusted than people and Intellectual Property becomes tightly connected to sovereign entities and no longer corporations. Big, incumbent AI players will likely be broken up and dissolved.
The final horizon may usher in a world that we are ill prepared to contemplate or manage. For sometime futurists have anticipated the creation of synthetic consciousness (see Spooky Action at a Distance – February 2011). The reason for this is simple and obvious. The meta-physics that govern human thought are not the same meta-physics that govern and inform the universe at large, including most of the domains that support human existence. Were we able to “assume” the consciousness that informs the greater universe we would be able to better manipulate and mange it. For instance, we “know” that nuclear fusion could be the source of unlimited, costless energy; we just don’t “know” how to make it work. The second order consequences of this horizon are potentially unfathomable and may even harbor the end of human existence. (see Hacking the Future 2.0 – January 2015)
A couple of quick take aways. First, don’t panic; we are still in the early stages when it comes to the commercialization of this technology. For the next several years there’e a better than even chance that more money will be lost in this space than made and that goes for venture, first mover and incumbent enterprises alike. Next, there is a dearth of talent and experience and a whole bunch of noise and things are likely to stay that way for the next few years. So, when the consultants, pundits and visionaries show up and start talking about how things will evolve, ignore them, they are just part of the noise. Nobody has this figured out. Even IBM, a shop with arguably the longest head start in this space is basically clueless when it comes to profiting from it (see The Next Cambrian Explosion – February 2016) and seems destined to remain that way for the foreseeable future. Things will start to get interesting by the middle of the first horizon. But don’t inhale too much of the exuberance. Revenues attributable to AI will most likely not be new, just siphoned from one industry or player to another with profit squeezed by the inevitable toll of productivity. Not unlike how the revenue attributable to commercial web services is not newly minted but rather re-allocated from previously allocated on-prem sources. Everyone here needs to take a breath and have a serious think about exactly where, how and when value from AI can be captured. Join the stampede and you just might get run over.
WYSYMNG: What You See You May Not Get
Back in dawn of technology history, comedian Lily Tomlin once did a bit written by her partner Jane Wagner where her character “Trudy” posited:
“What’s reality but a collective hunch?”
And when it comes to AI; that pretty much nails it.
What we know is that it works but we can’t be certain why. Further, we can’t ask it how it comes to the conclusions it provides because it can’t be interrogated and even if it could the answers it provides couldn’t be corroborated. For a deeper dive into this very issue check out Will Knights article entitled “The Dark Secret at the Heart of AI” that recently appeared in MIT Technology Review. And if this is beginning to seem a little like a Minority Report scenario, where three pre-crime savants drop named balls into a perpetual motion mobile, you wouldn’t be the first to jump to that conclusion. In the event real-life actions taken based on AI conclusions go awry, say self-driving cars, robo-surgeries, stock market trades, where would the culpability and liability lie? An issue that has already gleaned the attention of sovereign entities, legal jurisdictions and plaintiffs’ attorneys.
And if that doesn’t have your attention, then there’s this little item. AI can be easily spoofed. Researchers in Japan have proven that changing as little as one bit of information in an image can produce serious spurious results when it comes to things like image/object recognition. So the next time you’re hunkered down in front of your favorite cable station, count the number of visual burps you see and assume the same amount will show up in the image/object recognition algorithm in your next driverless Uber.
But take heart, AI is still destined to astonish us.
Researchers have already deduced that AI could easily obtain a higher emotional IQ than most high-tech middle managers, auto mechanics and lawyers, thus eventually eliminating the need for them, something long over due. But lurking in our future, somewhere in the third horizon, a novel consciousness may conclude that humans are far too taxing and posit that a more elegant solution might be one that eliminates the need for us.
Get ready, it’s gonna be a brave new world.
For those of you who might enjoy a fictional version of our dystopian future keep reading: The Carver Cavitation
For a more recent take on the issues behind these topics please go to “AI’s Inconvenient Truth” in the Epilogues tab.
Cover graphic courtesy of the National Hot Dog Association, all other images, statistics, citations, etc derived and included under fair use/royalty free provisions.