Contact Us

Use the form on the right to contact us.

You can edit the text in this area, and change where the contact form on the right submits to, by entering edit mode using the modes on the bottom right. 

130 E 59th St., Floor 17
New York, NY

Cutting Through the Machine Learning Hype

Blog

Cutting Through the Machine Learning Hype

Jason Black

The tech ecosystem is well acquainted with buzzwords. From “Web 2.0” to “cloud computing” to “mobile first” to “on-demand,” it seems as though each passing year heralds the advent and popularization of new catchphrases to which fledgling companies attach themselves. But while the trends these phrases represent are real, and category-defining companies will inevitably give weight to newly coined buzzwords, so too will derivative startups seek to take advantage of concepts that remain ill-defined by experts and little-understood by everyone else.

In a June post, CB Insights encapsulated the frenzy (and absurdity) of the moment:

It’s clear that 9 of 10 investors have very little idea what AI is so if you’re a founder raising money, you should sprinkle some AI into your pitch deck. Use of ‘artificial intelligence,’ ‘AI,’ ‘chatbot,’ or ‘bot’ are winners right now and might get you a little valuation bump or get the process to move quicker.

If you want to drive home that you’re all about that AI, use terms like machine learning, neural networks, image recognition, deep learning, and NLP. Then sit back and watch the funding roll in.

Pitch decks and headlines today are lousy with references to “artificial intelligence” and “machine learning”. But what do those terms really mean? And how can you separate empty claims from real value creation when evaluating businesses and the technologies which underpin them? Having at least a passing knowledge of what you’re talking about is a good first step, so let’s start with the basics.

Definitions

Artificial Intelligence

The terms “artificial intelligence” and “machine learning” are frequently used interchangeably, but doing so introduces imprecision and ambiguity. Artificial intelligence, a term coined in 1956 at a Dartmouth College CS conference, refers to a line of research that seeks to recreate the characteristics possessed by human intelligence.

At the time, “General AI” was thought to be within reach. People believed that specific advancements (like teaching a computer to master checkers or chess) would allow us to learn how machines learn, and ultimately program computers to learn like we do. If we could use machines to mimic the rudimentary way that babies learn about the world, the reasoning went, soon we would have a fully functioning “grown up” artificial intelligence that could master new tasks at a similar or faster rate.

In hindsight, this was a bit too optimistic.

While the end goal of AI was — and still is — the creation of a sentient machine consciousness, we haven’t yet achieved generalized artificial Intelligence. Moreover, barring a major breakthrough in methodology, we don’t have a reasonable timeline for doing so. As a result, research (especially the types of research relevant to the VC and startup world) now focuses on a sub-field of AI known as machine learning aimed at solving individual tasks which can increase productivity and benefit businesses today.

Machine Learning

In contrast with AI’s stated goal of recreating human intelligence, machine learning tools seek to create predictive models around specific tasks. Simply put, machine learning is all about utility. Nothing too flashy, just supercharged statistics.

While there are plenty of good definitions for machine learning floating around, my favorite is Tom M. Mitchell’s 1997 definition:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.

Rather formal, but this definition is buzzword-free and gets straight to the elegance and simplicity of machine learning. Simply put, a machine is said to learn if its performance at a set of tasks improves as it’s given more data.

Need an example? How about one from your Statistics 101 course: simple linear regression. The goal (or Task) is to draw a “line of best fit” given some initial set of observed data. Through an iterative process that seeks to minimize the average distance from the regression line and the scatterplot of data (its Performance measure), linear regression improves its predictive “line of best fit” with each additional data point (Experience).

Red dots represent scatter plot of all data. The blue line minimizes average distance from the regression line (represented here by grey lines).

Red dots represent scatter plot of all data. The blue line minimizes average distance from the regression line (represented here by grey lines).

Boom. Machine learning.

Given that relatively low bar, nearly any tech company can claim to be “leveraging machine learning.” So where do we go from here? To further demystify the topic, it’s also useful to understand how machine learning algorithms are developed. With linear regression, the algorithm in question simply draws a line which gets as close to as many individual data points as possible. But how about a real world example?

While the math behind more sophisticated machine learning models quickly becomes incredibly complex, the underlying concepts are often very intuitive.

Developing a Machine Learning Model

Say you wanted to predict what new songs a particular Spotify user would enjoy. Follow your intuition.

You’d probably start with his or her existing library and expect that other users who have a large number of songs in common would be likely to enjoy the complement set of the songs in the other user’s library (a process called collaborative filtering). You might also analyze the acoustic elements in the user’s library to look for common traits such as an upbeat tempo or use of electric guitar (Spotify uses neural networks to do this, for example). Finally you might assign an appropriate weight to the tracks a user has listened to repeatedly, starred, or marked with a thumbs up/down.

Check out this visualization of the filters learned in the first convolutional layer of Spotify’s deep learning algorithm. The time axis is horizontal, the frequency axis is vertical (frequency increases from top to bottom).

Check out this visualization of the filters learned in the first convolutional layer of Spotify’s deep learning algorithm. The time axis is horizontal, the frequency axis is vertical (frequency increases from top to bottom).

All that’s left is to translate these intuitions into a mathematical representation that ingests the requisite data sources and outputs a ranked list of songs to present to the user. As the user listens, likes, and dislikes new music, these new data points (or Experience in our earlier terminology) can be fed back into the same models to update, and thus improve, that prediction list.

If you want to learn more about more complex machine learning algorithms, there are ample resources across the web that do a great job of explainingneural networks, deep learning, Bayesian networks, hidden Markov models, and many more modeling systems. But for our purposes, technical implementation is less relevant than understanding how startups create value by harnessing that technology. So let’s keep moving.

Where’s the value?

Now that we have covered what machine learning is, for what should savvy investors and skeptical readers be on the lookout? In my experience, the initial litmus test is to walk through the three fundamental building blocks of a machine learning model (task T, performance measure P, and experience E) and look for new or interesting approaches. It is these novelties which form the basis of differentiated products and successful startups.

Experience | Unique Data Sets

Without data, you can’t train a machine learning model. Full stop.

With a publicly available training set, you can train a machine learning model to do specified tasks, which is great, but then you are relying on tuning and tweaking the performance of your algorithm to outperform others. If everyone is building machine learning models with the same sets of training data, competitive advantages (at least at the outset) are all but non-existent.

By contrast, a unique and proprietary data set confers an unfair advantage. Only Facebook has access to its Social Graph. Only Uber has access to the pickup/dropoff points of every rider in its network. These are data sets that only one company can use to train their machine learning models. The value of that is obvious. It’s basic scarcity of a private resource. And it can create an enormous moat.

Take *Digital Genius, as an example. The Company offers customer service automation tools and counts numerous Fortune 500 companies as clients. These relationships offer Digital Genius exclusive access to millions of historical customer service chat logs, which represent millions of appropriate responses to a wide swath of customer queries. Using this data, Digital Genius trains its Natural Language Processing (NLP) algorithms beforebeginning to interact with new, live customers.

In order to attain the same level of performance, a competitor would have to amass a similar number of chat logs from scratch. Practically speaking, this would require performing millions of live customer interactions, many of which would likely be frustrating and useless for the customers themselves. While the algorithm would eventually learn and improve, the model’s day one performance would be lackluster at best, and the company itself would be unlikely to gain traction in the market. Thus, having the proprietary data sets from their largest clients gives Digital Genius a real, differentiated value proposition in the chat automation space.

Of course, another way to go about gaining access to a unique data set is to capture one that has never existed. The coming wave of IoT and the proliferation of sensors promise to unlock troves of new data sets that have never before been analyzed. Companies which get proprietary access to new data sets, or those which create proprietary data sets themselves, can thus outperform the competition.

*OTTO Motors (a division of Clearpath Robotics), has captured one of the most robust data sets of indoor industrial environments on the planet from their network of autonomous materials transport robots (pictured below). Every time an OTTO robot makes its way around the factory floor, information about its environment — moving forklifts, walking workers, path obstructions — can be sent back to a centralized database. If the company then develops a more robust model to navigate around forklifts, for example, the OTTO Motors team can backtest and debug their improvements against real-world, historical environment data without needing to actually test their robots or even use physical environments.

An OTTO 1500 robot autonomously navigates around a warehouse.

An OTTO 1500 robot autonomously navigates around a warehouse.

This same data-race is even more competitive on the road. The reason why the Google Self-Driving Car, Tesla Autopilot, and Uber Self-Driving teams all tout (or forecast) the number of autonomous miles driven is because each additional mile captures valuable data about changing environments that engineers can then use to test against as they improve their autonomous navigation algorithms. But relative to the global total number of miles driven per year (an estimated 3.15 trillion miles in 2015 in the US alone), only a de minimus number of those are being captured by the three projects mentioned above, leaving greenfield opportunity for startups like Cruise AutomationnuTonomy, and Zoox.

The final, and most experimental approach to leveraging unique data sets is to programmatically generate data which is then used to train machine learning algorithms. This technique is best suited for creating data sets that are difficult or impossible to collect.

Here’s an example. In order to create a machine learning algorithm to predict the direction a person is looking in a real world environment, you first have to train on sample data that has gaze direction correctly labeled. Given the literal billions of images that we have of people looking, with their eyes open, in different directions in every conceivable environment, you’d think this would be a trivial task. The data set—it would seem—already exists.

The problem is that the data isn’t labeled, and manually labeling, let alone determining, a person’s exact gaze direction based on a photograph is way too hard for a human to do to any degree of accuracy or in a reasonable length of time. Despite possessing a vast repository of images, we can’t even create good enough approximations of gaze direction for a machine to train on. We don’t have a complete, labeled set of data.

Programmatically generated eyes used to train machine learning algorithms to determine gaze direction.

Programmatically generated eyes used to train machine learning algorithms to determine gaze direction.

In order to tackle this problem, a set of researchers at the University of Cambridge programmatically generated renderings of an artificial eye and coupled each image with its corresponding gaze direction. By generating over 10,000 images in a variety of different lighting conditions, the researchers generated enough labeled data to train a machine learning algorithm (in this case, a neural network) to predict gaze direction in photos of people the machine had not previously encountered. By programmatically generating a labeled data set, we sidestepped the problems inherent to our existing repository of real-world data.

While means of finding, collection, or generating data on which to train machine learning models are varied, evaluating the sources of data a company has access to (especially those which competitors can’t access) is a great starting point when evaluating a startup or its technology. But there’s more to machine learning than just Experience.

Task | Differentiated Approaches

Just as access to a unique data set is inherently valuable, developing a new approach to a machine learning Task (T) or starting work on a new or neglected Task provide alternative paths to creating value.

DeepMind, a company Google acquired for over $500M in 2014, developed a model generation approach that enabled them to pull ahead of the pack in a branch of machine learning known as deep learning (hence the name). While their acquisition went largely unnoticed by the mainstream press, it was difficult to miss the headlines as their machine learning algorithm dubbed “AlphaGo” squared off against the world champion of Go in early 2016.

The rules of the game of Go are relatively simple, yet the number of possible board positions in the game outnumber the atoms in the universe. Traditional machine learning techniques by themselves simply could not produce an effective strategy given the number of possible outcomes. However, DeepMind’s differentiated approach to these existing techniques enabled the team not only to best the current world champion of the game, Lee Sedol, but do so in such a way that spectators described the machine’s performance as “genius” and “beautiful.”

However, the sophistication of performance on one Task does not translate well to other domains. Use the same code from the AlphaGo project to respond to customer service enquiries or navigate around a factory floor and the performance would likely be abysmal. Practically, the approximate 1:1 ratio between Task and machine learning model means that for the short- and medium-term there are innumerable Tasks for which no machine learning model has yet been trained.

For this reason, identifying neglected Tasks can be quite lucrative and easier than one might expect. One might assume, for example, that since a significant amount of time, effort, and money has been spent on improving photo analysis, that video analysis has enjoyed the same performance gains. Not so. While some of the models from static image analysis have carried over, the complexity associated with moving images and audio has discouraged development, especially as plenty of low hanging fruit in the photo identification space still remains.

*Dextro’s Stream API annotating live Periscope videos in real time.

*Dextro’s Stream API annotating live Periscope videos in real time.

This created a great opportunity for *Dextro and Clarifai to quickly pull out ahead in applying machine learning to understand the content in videos. These advancements in video analysis now enables video distributors to create searchable videos based on not just the manually submitted metadata from the users who upload, but also the content contained within the video like the transcript of the video, the category of video, and even individual objects or concepts that appear throughout the video.

Performance | Step Function Improvement

The final major value driver for startups seeking to harness machine learning technology is meaningfully outperforming the competition at a known Task.

One great example is Prosper which makes loans to individuals and SMBs. Their Task is the same as any other lender on the market — to accurately evaluate the risk of lending money to a particular individual or business. Given that Prosper and their peers in both the alternative and the traditional lending world live or die by their ability to predict creditworthiness, Performance (P) is absolutely critical to the success of their business. So how do relatively young alternative lenders outperform even the largest financial institutions out there?

Instead of taking in tens of data points about a particular borrower Prosper draw an order of magnitude more data. In addition to using a larger and differentiated data set, the new wave of alternative lenders like Prosper have been rigorously scouring research papers and doing their own internal development in order to incorporate bleeding edge machine learning algorithms to their data sets. Together, the Performance characteristics of the resulting machine learning models represent a unique and differentiated ability to issue profitable loans to a whole group of consumers and businesses who have historically been turned away by legacy institutions.

Being able to judge the performance of a startup’s machine learning models against that of the competition is another great way to cull the most innovative companies and separate out the mere peddlers of hype and buzz.

Back to Business

To be clear, there’s much more to machine learning than hyped-up pitch decks and empty promises. The trick is culling the wheat from the chaff. Armed with clear definitions and a working knowledge of the simple concepts underlying the buzzwords and headlines, go forth and pick through presentations with confidence!

But remember this caveat.

Yes, machine learning — when harnessed appropriately — is both real and powerful. But the ultimate success or failure of any business hinges much more on the market opportunity, productization, and the team’s ability to sell than it does on specific implementations of machine learning algorithms. Just as compelling tech is a necessary but insufficient condition to create a successful tech company, great tech in the absence of a viable business is unlikely to become anything more than a science project.


Big thanks to Cooper Zelnick for being sounding board and an editor on this one. Shoutout to Ryan Atallah and Sven Kreiss for proofing for technical errors as well.

* Denotes an RRE portfolio company.