AlphaGo’s victory on S. Korea’s champion Le SeDol was a shock to many in the computer world – but was only a natural development in the story of Artificial Intelligence, as it unfolds in the last few years. What is Deep Learning, and how can computers learn ‘skills’ and ‘intuition’?
Subscribe: iTunes | Android App | RSS link | Facebook | Twitter
Deep Learning & Artificial Intelligence, Part I
Written By: Ran Levi & Nate Nelson
It’s Game Two. We’re now watching the best player in the world, Lee Se-Dol, facing off against a machine, named AlphaGo, in one of humanity’s most storied and complex strategy games: ‘Go’.
Game one saw AlphaGo take an easy victory–one that shocked onlookers around the world. Se-Dol, the favorite, carries the weight of the world on his back as he fights to regain footing in the best of five series. An hour in, his prospects are looking good.
Then, AlphaGo plays the game’s 37th move. You can see the announcer’s hand shake as he copies the move to the internet broadcast’s big board. He adjusts the piece, unsure of whether he’s mistaken in its placement. Murmurs are rising from the crowd.
And just like that, it’s over.
In the time since AlphaGo proved itself the world’s best Go player, move 37 has taken on a sort of cult status in pop culture. Major news outlets around the globe covered the story with articles like “Move 37, or How AI Can Change the World”, a deep learning startup was founded and named itself Move 37, and internet forums were swept with curiosity and speculation.
For those not experienced in Go, move 37 might otherwise appear totally nondescript–vertically centered, in the rightmost third of the grid, placed next to a white piece in an otherwise empty section of the board, it hardly looks like anything out of the ordinary. Yet, to those who know the game well, it was just about revolutionary.
Fan Hui, the European Go champion who in a five-game series was cleanly swept by AlphaGo prior to its matches against Lee Sedol, eagerly observed match two. Of move 37, he told reporters “It’s not a human move. I’ve never seen a human play this move.” Hui, knowing the game so intimately, was as shocked as anyone. “So beautiful,” he said and repeated that word over and over again: beautiful. But how can a machine attain beauty?
To understand why move 37 was so amazing, you have to understand how the game of Go works. Invented by the ancient Chinese as a playful representation of war, two players with white or black stones face each other in a battle to take as much territory as possible on a nineteen-by-nineteen space board. Players are free to place stones anywhere on the board of 361 spots, and the goal is to totally surround your opponent’s pieces, therefore kicking those pieces off the board and collecting territory. Because the game is so nonlinear, with such an incomprehensible number of possible moves, sequences, and game arrangements, there’s really no catch-all strategy you can stick to. After all, the number of potential game possibilities in Go outnumbers the total atoms in our universe–a number something along the lines of 208168199381979… you get the picture.
Anyway, this number is so big there’s little point in even trying to comprehend it. Therefore, because any game of Go can go any number of directions, experienced players will tell you it’s all about feel. As the subconscious brain tries to do its best to analyze such a multifaceted and chaotic board, it’s up to a player’s intuition to determine the best course of action.
This is why Go seemed such an unattainable task for machines to master: computers are supposed to follow orders, so how can that translate to something like intuition? Move 37 appeared entirely unintuitive–DeepMind’s programmers later calculated that a human player might have a 1 in 10,000 chance of choosing it at that stage in the game–and yet proved so effective that Lee Se-Dol felt right away its colossal genius, and stood up and left the room almost immediately after realizing what had been done to him.
AlphaGo is the first time humanity has been shown that machines will be able to beat us at characteristically, previously uniquely human abilities like feel and intuition. Soon, perhaps, this could evolve to emotion, at which point we’d all have a real existential crisis on our hands.
The Brain as A Machine
Before we fall too deep into hypotheticals, though, let’s begin with an even more fundamental question: is a computer at all capable of imitating what we might see as ‘human thought’? For example, the ability to draw conclusions, to raise ideas, Intuit, and so on?
Well, the seventeenth-century French philosopher Rene Descartes was one of the first thinkers to try to discern the fundamental difference between humans and machines. Descartes argued that, in principle, every aspect of the human body can be explained in mechanical terms: the heart as a pump, the lungs have bellows, and so on. That is, the body is merely a kind of sophisticated machine.
The brain, however, can not be explained in such mechanical terms, he said. Our thinking, speech, and ability to draw conclusions are so different from what those machines are capable of performing that they can not be explained in engineering terms, and we must use words like ‘soul’, ‘intellect’ or similarly abstract concepts to describe them.
The invention of the computer at the end of the first half of the twentieth century cast a heavy shadow on this hypothesis. The computer, remember, is basically a machine that performs a long series of mathematical operations. The programmer tells the computer what sequence it must follow to perform any task – for example, solving a mathematical equation or drawing an image on a screen – and the processor executes the commands quickly. Clearly, the computer does not “think” in the human sense of the word: it doesn’t solve the equation and doesn’t draw the picture – it is merely a machine that executes a sequence of commands. But outwardly, it certainly looks like the computer solved the equation and drew the picture. If you were to show a computer to someone who’s never seen or heard of a computer before, they’d almost certainly assume it either a magical or godly item or perhaps something controlled by a little person inside!
And why shouldn’t they? The nature of computers has caused many scientists and non-scientists alike to question the idea of a separate ‘mind’ or ‘soul’ in humans. If a relatively simple machine like a computer can seem to solve a problem, can not our brain itself be a kind of machine that performs calculations? In other words, is it possible that our brains are nothing more than very complex machines, and that all our thoughts and ideas are merely the result of calculations? If this hypothesis turns out to be correct, then the answer to the question I raised above may well be: yes, it is possible to construct a computer that will perform the required calculations and thus imitate the operation of the human brain.
Neurons and Perceptrons
To try to build such a machine, researchers’ first instinct was to turn to biology and neuroscience. Advances in the field of neurology and physiology of the brain inspired them abundantly. Through the twentieth century, neuroscientists uncovered many biological mechanisms that underpin the brain’s activity, foremost of which is the importance of neurons –the nerve cells from which the brain is made.
How do neurons work? Neurons are tiny cells with long arms that connect to each other and transmit information through electrical currents. Each neuron has multiple inputs, and it receives electrical pulses from other neurons. In response to these pulses, the neuron produces its own electrical pulse at the output.
In 1949, a researcher named Donald Hebb discovered one of the most important fundamental principles of the brain: the way learning takes place. His research revealed that if two neurons–A and B–are interconnected, and neuron A shoots electrical pulses, over time the B neuron, in response, will begin to fire pulses more efficiently. The constant firing of neuron A will cause neuron B to learn that the information coming from neuron A is important, and should be responded to by shooting your own pulse in response.
This insight inspired a psychologist named Frank Rosenblatt, an American cognitive expert at Cornell University. In 1958, Rosenblatt conceived a new type of electrical component called the “Perceptron” (from the word ‘perception’). Rosenblatt’s perceptron was a kind of artificial neuron: an abstract model of the human neuron. He had a number of inputs that received binary values – 0 or 1 – and one output that could also produce 0 or 1, a sort of positive or negative result.
R: The interesting detail is that Rosenblatt found a way to simulate the brain’s learning process as described by neurologists. Imagine the perceptron as a black and opaque box, with several inputs and outputs. Each input has a dial that can be rotated, thus determining that one input or another has a higher weight, or importance, relative to the other inputs. Turn the dial all the way to the right, and now every little signal at the entrance makes the perceptron set the output to ‘1’: it is to teach the perceptron that this input is very important and should not be ignored. Turn the dial all the way to the left, and the perceptron will “learn” that this input is not important at all; no matter what happens, the perceptron will not respond to any of its signals. The weights at the perceptron inputs simulate the strength of the connections between biological neurons in a human brain.
Rosenblatt set up a device in his lab that contained several such perceptrons connected to each other in a kind of artificial neural network, and connected their inputs to four hundred light receptors. Rosenblatt placed letters, numbers, and geometric shapes in front of the light receptors, and by fine-tuning the weights at the inputs, he managed to ‘teach’ the network to identify the shapes and in response to produce signals that meant things like: ‘the letter A’ or ‘a square’.
How did he do that? By strengthening or weakening the connections between the perceptrons: each time the perceptrons did not correctly identify a form, he modified the weights slightly – if you will, played a little with the dials – until each perceptron learned that a certain combination of inputs yields one result, and another combination another result. The game with the weights is a way to fix the system’s errors, to tell it: “What you did just now was wrong. Here’s the right way to do it.”
Perceptrons and Programs
In order to understand the importance of Rosenblatt’s experiment, we need to sharpen the fundamental distinction between Rosenblatt’s perceptron machine, versus how a “normal” computer program operates.
A program is a sequence of commands that a human programmer defines: if the programmer wants to make the computer recognize a square, for example, he must formulate rules for the machine that explicitly define the properties of the shape.
The perceptrons’ learning, on the other hand, did not occur by defining rules, but by presenting a variety of sample squares, and strengthening or weakening the connections between the perceptrons until the combination of weight values was found at the entrance to each perceptron.
These are two completely different approaches to problem solving: In the first, we dictate to the computer rules for solving the problem, such as “If a shape has four sides, is equilateral, and its angles are all at 90 degrees, then it is square.”
In the second, we provide it with examples of squares and a series of simple steps that, if executed, will gradually improve the network’s ability to identify squares.
Notice that at no point does Rosenblatt tell the machine what a square is. He only “taught” the machine by playing with its left-right-right-left dials until it developed a system for recognizing squares.
This approach, of learning from examples, is very close to the way people acquire many skills: for example, parents teach their children to identify a square by pointing to a square and saying “square” again and again, or pointing to a shape and asking ‘what is it?’. If the child is wrong, the parent corrects him or her, and if the child is right, they get rewarded.
The new approach to computer programming offered by Frank Rosenblatt does, in many ways, mirror what would happen half a century later when AlphaGo offered a paradigm shift in the field of AI.
AlphaGo vs. Deep Blue
One way to make sense of why AlphaGo, a machine that got good at a board game, means so much to the future of technology, is to compare it with its spiritual predecessor: IBM’s Deep Blue machine.
Deep Blue was sort of like the original AlphaGo–in 1997, it beat the world’s greatest chess player, Gary Kasparov, in a no less dramatic fashion than Move 37: Kasparov, knowing his fate, stood up and walked off the television set with his arms out, frustrated, as if to say “What do you want me to do?”
IBM’s Deep Blue was a thoroughly programmed artificial intelligence that was given a rulebook on how to best play chess. Every move Deep Blue executed was leveraged against other possible moves it could have played at any given stage in a chess match, keeping in mind the value of each piece and referencing pre-programmed chess information as it checked 200 million positions per second on the board. The algorithm sifted through a huge pool of possibilities and chose the move that would position it best moving forward. Because there are only a limited number of pieces, ways each piece can move, and spaces on the board to move to, Deep Blue’s processors were powerful enough to make such calculations.
AlphaGo, however, was created with a different approach. Because the game of Go has such an incomprehensible number of possible combinations of moves available to players through the game–it doesn’t even have the simplicity of direction like chess does–there’s no way even our most powerful computers can sift through every potential outcome at every stage in the game. Instead, the programmers at DeepMind fed their algorithm thousands of real games played by professionals–amounting to some 30 million moves–and the program itself learned how the pros do it. The program dutifully sifted through all of the scenarios and moves, recognizing patterns in how humans play the game and the strategies they employ, in order to be able to play like the best. Once AlphaGo could play like a pro, it was then given the task to play itself. Throughout many iterations, the program would improve upon itself incrementally as it tried to find ways to beat previous versions of itself, adopting strategies that work to its advantage and scrapping those that result in losses.
In this way, whereas Deep Blue calculated all possible scenarios and opted for the most effective outcome, AlphaGo had to learn from the ground up through trial and error. Where Deep Blue draws on a database of information to mathematically draw up a decisive path to victory, AlphaGo is made strong by a wealth of experience built up over time, to where it doesn’t have to look back at every single data point it’s been fed because a process has been honed–something akin to “skill”.
But scientists in the mid-20th century couldn’t see so far ahead, and there were researchers who were not swept away by general enthusiasm for neural networks. Here is the place to take a step back and look at the broader picture of artificial intelligence research.
Over the years, several different approaches have been developed to solve the question of how to make a computer behave intelligently. Some researchers prefer logic-based methods, others prefer methods based on statistical calculations, and several other approaches have also been floated.
We’re not going to review all of these approaches in depth in this chapter, but I will note that there was no consensus among researchers as to which approach is absolutely superior to the others. Specifically for our purposes, in the 1960s and 1970s, many researchers believed that the attempt to imitate the biological mechanism of the brain would not lead us to artificial intelligence computers.
Why? For the same reason that aircraft engineers do not try to imitate the flight of birds. If there is more than one way to solve a particular problem, the biological path is not necessarily the simplest and easiest way, from an engineering point of view.
One of those skeptical scholars was Marvin Minsky. His objection stemmed from the fact that Rosenblatt’s perceptron had no solid theoretical basis – in other words, there was no mathematical theory that explained how the weights should be adjusted at the inputs to the perceptron to induce it to perform any task. The success of Rosenblatt’s system in identifying shapes did not impress him; he claimed that the forms and letters that Rosenblatt used to demonstrate the abilities of the perceptron were too simplistic and not a real challenge.
In 1969, ten years after Rosenblatt first demonstrated the prototype of the perceptron, Minsky and another talented mathematician named Seymour Papert together published a book called “Perceptrons”. The book analyzed in depth the component invented by Rosenblatt – and concluded that it would be impossible to develop artificial intelligence systems with it.
Why? The perceptron, they wrote, is an elegant component with impressive capabilities for its simplicity – but to build complex systems that can perform complex tasks, a lot of preceptors should be connected.
The Problem with Many Layers
Imagine a cake made of many layers: each layer has a number of perceptrons whose inputs are connected to the perceptrons in the layer above – and their outputs to the perceptrons in the layer below.
The problem identified by Minsky and Papert lies in the inner layers of the cake. Rosenblatt’s system consisted of two layers: inputs, and outputs. When you have two layers of perceptrons it’s relatively easy to find the correct adjustment of the weights in order to reach the desired result. But if you have an ‘internal’ layer – that is, the one between the input layer and the output layer – it’s much harder to determine the proper weights.
To understand this, let’s imagine a group of children playing the game Telephone: You tell the first child a sentence, who whispers it to the next child in line, who whispers it to the next, and so on, until you reach the last child, who says aloud the sentence he heard. The first child in the row is, in our analogy, a perceptron in the input layer. The last child – the perceptron in the output layer. All the children in between are perceptrons in inner layers.
Now, suppose we have only two children in the game: you tell the first child the sentence, he tells the other child, who says it out loud. If something went wrong and the sentence did not come out right – it’s easy to find out where the problem was: either the first child misdirected it, or the other child misunderstood it. You find the problem child, explain what needs to be done – maybe hint to a possible candy waiting for a good boy… that’s it, basically. However, if we have ten children in the game and the trial goes the wrong way, how will we know where the disruption occurred? This is much harder, and without this knowledge we can not fix the system.
The same applies, in principle, to the perceptron machine. If we have only two layers – inputs and outputs – it is easy to find the desired weight adjustment. If we have internal layers, it is very difficult to know how to play with the dials. Without internal layers, though, Minsky and Papert determined that artificial neural networks are limited to performing simple tasks such as identifying basic shapes and can never be used to perform more demanding tasks like facial recognition.
But here and there were still some stubborn scientists who refused to abandon the artificial neurons. One of them was a doctoral student named Paul Werbos, who in 1974 found a solution to the problem of internal layers that bothered Minsky and Papert so much.
His solution was a method called ‘backpropagation’. It means, essentially, going backward through the neural network and modifying the weights of the connections between the artificial neurons. You start from the final output layer and go back one layer at a time, changing weights as you go so that the next time the inputs arrive, the final result will be correct. With each iteration of trial and error, backward propagation allows the system to traces back its errors, gradually correcting where it went wrong until it reaches optimal results. The most significant thing to understand about backpropagation is that it is based on a sound and proven mathematical theory: that is, it has the solid theoretical basis that Marvin Minsky was looking for.
But Werbos was only a doctoral student, and his ideas got little attention from other researchers. It would take more than ten years after Paul Werbos’ initial discovery for the idea of backpropagation to be independently rediscovered by several researchers at the same time: Geoffrey Hinton, and David Rumelhart and James McClelland.
Learning a Language
Rumelhart and McClelland were psychologists in their training and came to the field of artificial neural networks not from computer science, but from the study of human language. Rumelhart and McClelland took part in the debate in the academic community about how children learn to speak. Language is a distinct human ability that distinguishes us from the rest of the animals, and therefore we can conclude that something in the structure of the human brain is unique in the natural world in this respect. The question is – what?
Most researchers have speculated that the language and speech rules are “encoded” within the brain in some way, like a hidden software somewhere in your head. David Rumelhart and James McClelland advocated a different view. They believed that there’s no ‘rulebook’ in the brain for learning language, but that our ability to learn a language was based on how neurons interact. In other words – there are no rules, there are only connections.
To prove their claim, the two psychologists turned to artificial neural networks. In 1986, they built a computerized model of a multi-layer neural network, and using the backpropagation process taught it how to manipulate English verbs in the past – for example, work-worked, begin-began, and so on. The network received a present-tense verb and had to guess what its past-tense form is. As anyone who has learned English knows, guessing the past-tense version of a verb isn’t trivial: there are verbs that have the ‘ed’ suffix added to the end – for example, work-worked, carry-carried – and there are verbs whose past form is unique – sing-sang, begin-began and so on.
When small children learn to speak English, there is a phenomenon that repeats itself in almost all cases. At first, the child memorizes the past form of several verbs and says them properly. But then the child discovers the ‘rule’ of adding ‘ed’ at the end, and out of enthusiasm begins to add ‘ed’ even to verbs for which doing so is incorrect: for example, singed, begined and similar errors. Only after constant correction does the child understand their mistake and learn when to add ‘ed’ and when not to.
Amazingly, the Artificial Neural Networks of Rumelhart and McClelland made the exact same mistake as children do. At the beginning of the learning process, the system correctly predicted the past form of verbs – but then, as more and more examples were entered into it, the neural network identified the rule of adding ‘ed’ to verbs – and then, just like with human children, it began to err and add ‘ed’ where it was not applicable. Only when the researchers introduced more and more examples of verb conjugation did the system learn when to add ‘ed’ and when not – and its predictive ability improved accordingly.
In other words, Rumelhart and McClelland demonstrated how a network of neurons can quite literally learn a characteristic of human language. Not only that, but the Artificial Neural Network, without any outside reason to do so, took the same exact path a human brain would to that end.
The real question now becomes: if this is also how babies learn language, do our brains perform their own version of backpropagation? Now that we’ve created machines that act like brains, maybe we need to ask less “Can computers think like humans?” and more “Do humans think like computers?”
To probe further, we’ll pick up in the next episode with the one advance that shot artificial intelligence research into the stratosphere, return to our tense series between AlphaGo and Lee Sedol and interview the CEO of a deep learning company.
Part II Coming Soon!
Sources And Bibliography