Ian Goodfellow came up with the most interesting idea in the field of AI–GANs.
GANs are redefining the frontiers of AI.
In supervised learning, a neural network is fed millions of images of cats, dogs, cars, houses, and so on, usually from imagenet. From these, the neural network learns that the shape of a dog belongs to the labeled classification “dog.” When you present the neural network with an image with something that’s 4-legged and furry, the neural network can guess with a high probability of success whether the image is of a dog or a cat.
If the neural network makes a mistake, it sends a signal about the error back through the network—a process called backpropagation, which enables it to readjust the parameters or weights between the network's neurons.
But this is not how humans learn.
When we are children, someone might tell us a couple of times that a certain animal is a dog and another is a cat.
When we see 4-legged furry creatures, we learn to reason that they too are dogs and not cats and vice versa. We learn by observing the world, not by someone telling us the names of every object in it. We draw up our own rules from what we observe and learn.
From this we also learn how to deduce new facts about the world. This cycle continues throughout our lives. There’s a treasure trove of data out there, the problem is how to get the neural network to understand it the way humans do. Getting machines to learn without labeled data is unsupervised learning.
For the neural network to learn the images of a dog, it has to be fed more and more photos until the process becomes supervised and not unsupervised learning.
Goodfellow was frustrated. He realized how difficult it was to generate images using generative models. He wrote a textbook on deep learning—both supervised and unsupervised—while continuing to brood on generative models and their shortcomings.
The birth of GANs
Suddenly, a new kind of generative AI model popped up into Goodfellow’s mind.
His idea was to train a neural network such that it can generate its own imaginary images of new sorts of dogs.
How do GANs work?
GANs are based on game theory. There are two dueling networks—the discriminator (D) and the generator (G). Both of these are deep neural networks.
D is fed with images from the real world from a dataset like imagenet.
Meanwhile, G starts generating images from the 1st layer that is latent space made up of noise like randomly situated dots. It’s chaotic, disorganized, and contains infinite possibilities—like our imaginations.
D decides if the images it receives from G is realistic based on the images it has been fed. D acts like a detective evaluating the images generated by G.
Initially, the images that G produces are completely abstract—a blur of shapes or mostly noise. So, D rejects these images.
G then tries again. G’s hidden layers begin to learn from its errors. G begins generating images not from the latent space layer as it did in the beginning but from its hidden layers.
Due to its interaction with D, G begins to learn to generate images that look real. While the discriminator tries to distinguish between real and unreal images, the generator tries to create images that’ll fool the discriminator into thinking that they’re real. Eventually, G’s images begin to look like the ones in D’s training set. G is like an art forger that tries to create an image that appears real to D.
In game theory, this process is known as the Nash equilibrium, where competing networks lessen the need for human intervention. In this way, neural networks begin to learn what the world actually looks like.
How do GAN’s reason?
What occurs between the input layers that receive the data and the output layer where the solution emerges. What goes on in the hidden layers? Engineers are trying to see what the GAN “sees” at some hidden layer. It’s possible to explore how a GAN works layer by layer by stepping on the brakes at every layer. You can see a GAN generating the rendering of the image right then and there–after each layer.
The images at an arbitrary hidden layer of G, though blurry, resemble its target image.
The intermediate layers of G have millions of interconnected neurons containing a bit of everything that the network is trained on. In this case, the imagenet dataset with lots of images of dogs and cats. If the image has even a hint of a dog in it, the relevant artificial neuron in that layer is stimulated to emphasize the ‘dogness.’ Then, you can reverse the process repeatedly back and forth and see what emerges.
D tells G that whatever you see there we need more of it. G keeps the strings of connections between the neurons fixed and changes the image.
With this process, you can see the visions of the world emerge through the eyes of a neural network. We get a glimpse into a GAN’s unconscious—into its inner life—into its dreams.
The human perceptual system works in a similar way, making us ‘see’ things that aren’t really there.
Generative models give AI a form of intelligence. Goodfellow says that GANs can be very useful for artists. He says that machines are already creative.
If creativity means generating something that’s new and beneficial—AI models are already at this point.