AI—the umbrella term for the field that includes deep learning—is a technology that’s developed by over a half-century of research.
Here’s a quick recap of how we got to where we are in AI.
Early days of AI
In 1955, John McCarthy, a math professor at Dartmouth, organized a project to explore the possibilities and limits of “artificial intelligence (AI),” a term he had coined the previous year.
His idea was to recreate human intelligence in a machine. To do this, he put together a group of four computer scientists who were already thinking about machines that could think: Marvin Minsky, Herbert Simon, Nathaniel Rochester, and Claude Shannon.
Rule-based vs neural networks
In the early 1960s, the field of AI had forked into two camps: the “rule-based” approach and the “neural networks” approach.
Researchers in the rule-based camp (also sometimes called “symbolic systems” or “expert systems”) taught computers to think by coding a series of logical rules: If X, then Y. The rule-based camp interviewed experts in various fields and then coded their wisdom into the program’s decision making.
The “neural networks” camp took a different approach. Researchers tried to reconstruct the brain's underlying architecture: constructing layers of artificial neurons that can receive and transmit information in a structure akin to our networks of biological neurons. They fed lots and lots of examples of a given phenomenon—pictures, chess games, and sounds—into a neural network and let the networks themselves identify patterns within the data.
AI Winter
In 1969, researchers from the rule-based approach convinced many in the field that neural networks were unreliable and limited in their use. The neural networks approach quickly went out of fashion. AI plunged into one of its first “winters” during the 1970s caused by major cuts in funding.
By the new millennium, most researchers had given up on the idea of neural networks, convinced that it was a technological dead end.
Resurrection of AI
Geoff Hinton was born in Wimbledon, England, just after the second world war. He was the great-great-great grandson of George Boole, the nineteenth-century British mathematician whose “Boolean logic” provided the mathematical foundation for computer science.
Hinton was one of the few who believed that neural networks would one day fulfill its promise.
One of the great problems with building a multilayered neural network was that it’s very difficult to determine the relative importance (“the weight”) of each neuron to the calculation as a whole. With a single-layer network, this was at least doable. The system could set its own weights across its single layer of neurons. But with a multilayered network, such an approach simply didn't work.
The relationships between the neurons are too complex. A more powerful mathematical model was needed where each weight is set in conjunction with all the others. The answer, Rumelhart suggested, was a process called “backpropagation.”
Backpropagation is essentially an algorithm, based on differential calculus, that sends a kind of mathematical feedback cascading down the hierarchy of neurons as the neurons analyzed more data and gained a better understanding of what each weight should be.
If you built a neural network and set all the weights to zero, the system could learn to adjust them on its own, sending changes down the many layers. But in the end, each weight would wind up at the same place as the rest.
This is how the math behaved. Rumelhart suggested “What if you didn’t set the weights to zero?” “What if the numbers were random?” He believed that if all the weights held different values in the beginning, the math would behave differently
It would find the weights that allowed the system to actually recognize complex patterns, such as the photo of a dog. Over the next several weeks, Rumelhart and Hinton got to building a system that began with random weights. And in setting these weights, the system could actually recognize patterns in images.
The neural networks using this technique—now rebranded as “deep learning networks”—could outperform older models at a variety of tasks.
In 2012, a neural network built by Hinton and two of his students won a competition in an international computer vision contest. After decades spent on the margins of AI research, neural networks hit the mainstream overnight, this time in the form of deep learning.
Researchers, futurists, and tech CEOs all began buzzing about the massive potential of deep learning neural networks.
Hinton auctioned his research in the form of a company called DNNresearch. Four companies joined the bidding for his new company: Baidu, Google, Microsoft, and a two-year-old start-up DeepMind. Hinton sold DNNresearch to Google.
Conclusion
Deep learning networks require large amounts of two things: computing power and data.
The internet has led to an explosion of all kinds of digital data: text, images, videos, clicks, purchases, and so on and cloud technologies allow for cheap computing power that deep-learning networks use to parse the data at high speeds.
This combination of large data sets, called “big data,” combined with powerful computers are responsible for the rise of deep learning which is fundamentally changing the way that technology is built.