Generative Adverserial Networks (GANs): Creating Data from Noise

Generative Adversarial Networks are a relatively new technique. Introduced in 2014 by Ian Goodfellow (of Open AI), he proposed a new unsupervised learning technique that uses random noise to generate new data that represents real data. Yann LeCunn, director of Facebook AI Research and deep learning researcher, called GANs the most important idea in machine learning in the past 20 years.

What makes GANs so cool?

GANs can generate data given some input, and one of the more interesting applications of this is generating images.

There are many ways a GAN can generate an image, but one of my favorites is StackGAN which generates images based on input text.

Images generated from text descriptions

The results are incredible. The text descriptions are very specific and it is still able to generate an image that fits. Surely there are examples that didn't turn out as well as these did, but it's still impressive. I can imagine down the road we will be able to generate videos or movies from text. "Alexa, generate a sequel to the Dark Knight with Heath Ledger as the Joker" or "Alexa, show me a video of a man fighting 100 duck sized horses".

How does it work?

The structure of GANs are setup so that the discriminator (D) is given data from both the generator (G) and the true data source. The discriminator is given training samples from both real and generated data sources, and then the generator is trained on the discriminator's response.

To give an analogy, you can pretend that D is the FBI and G is a group of counterfeiters. G is continuously improving on their counterfeiting methods and D is continuously improving their counterfeit detection methods. To give another analogy, D is Harrison Ford trying to figure out whether he's talking to a replicant or a real human.

Voight-Kampff test, the replicant discriminator

G generates new samples by feeding random noise (as a source of entropy) through its current model. The model then produces a sample that it thinks will "fool" the discriminator.

Both G and D are modeled using some kind of neural network. They can be simple MLPs or CNNs. D is trained in a classic supervised learning method, it is given real and generated examples with labels and it is trying to maximize the probability of assigning the correct label to each. At the same time, G is trained to maximize log(D(G(z))), where z is random noise.

GAN diagram

Example with code

To show an example, I will be using the MNIST dataset as my real data and the generator will learn to generate hand-drawn digits that are indiscernible from the original images.

First, we will define our function that generates random noise:

def random_init(size):  
    in_dim = size[0]
    stddev = 1. / tf.sqrt(in_dim / 2.)
    return tf.random_normal(shape=size, stddev=stddev)

Next we will initialize variables for our discriminator MLP and generator MLP:

X = tf.placeholder(tf.float32, shape=[None, 784])

d_weights1 = tf.Variable(random_init([784, 128]))  
d_biases1 = tf.Variable(tf.zeros(shape=[128]))

d_weights2 = tf.Variable(random_init([128, 1]))  
d_biases2 = tf.Variable(tf.zeros(shape=[1]))

theta_D = [d_weights1, d_weights2, d_biases1, d_biases2]


Z = tf.placeholder(tf.float32, shape=[None, 100])

g_weights1 = tf.Variable(random_init([100, 392]))  
g_biases1 = tf.Variable(tf.zeros(shape=[392]))

g_weights2 = tf.Variable(random_init([392, 784]))  
g_biases2 = tf.Variable(tf.zeros(shape=[784]))

theta_G = [g_weights1, g_weights2, g_biases1, g_biases3]  

And then we can create and define the structures for each neural net:

def generator(z):  
    g_layer1 = tf.nn.relu(tf.matmul(z, g_weights1) + g_biases1)
    g_log_prob = tf.matmul(g_layer1, g_weights2) + g_biases2
    g_prob = tf.nn.sigmoid(g_log_prob)

    return g_prob

def discriminator(x):  
    d_layer1 = tf.nn.relu(tf.matmul(x, d_weights1) + d_biases1)
    d_logit = tf.matmul(d_layer1, d_weights2) + d_biases2
    d_prob = tf.nn.sigmoid(d_logit)

    return d_prob, d_logit

You can see that these neural nets have simple two layer structures. The generator takes in a vector of random noise and returns a 786 length vector which will be reshaped into a (28, 28) image. The discriminator takes in a 28x28 image and returns a probability whether it thinks the image is real or not.

Next step is to define our loss function:

g_sample = generator(Z)  
d_real, d_logit_real = discriminator(X)  
d_fake, d_logit_fake = discriminator(g_sample)

d_loss = -tf.reduce_mean(tf.log(d_real) + tf.log(1. - d_fake))  
g_loss = -tf.reduce_mean(tf.log(d_fake))  

You can see in the code that we get a generated sample and a real sample, give both to the discriminator, then calculate the loss.

These losses are then given to an Adam optimizer. Remember that for G we are trying to maximize log(D(G(z))) and D is trying to maximize it's assignment of the correct probability. Tensorflow only supplies a minimize function for optimization, which is why the losses defined with the negative sign in the above code block.

d_solver = tf.train.AdamOptimizer().minimize(d_loss, var_list=theta_D)  
g_solver = tf.train.AdamOptimizer().minimize(g_loss, var_list=theta_G)  

Now all we have to do is iteratively train our models:

for it in range(100000):  
    if it % 1000 == 0:
        samples = sess.run(g_sample, feed_dict={Z: sample_Z(16, Z_dim)})
        i += 1

    X_mb, _ = mnist.train.next_batch(mb_size)

    _, d_loss_curr = sess.run([d_solver, d_loss], 
                              feed_dict={X: X_mb, Z: sample_Z(mb_size, Z_dim)})
    _, g_loss_curr = sess.run([g_solver, g_loss], 
                              feed_dict={Z: sample_Z(mb_size, Z_dim)})

I omitted some code that is saving the generator output every 1000 iterations for brevity, but you can see the output of the generator below:

You can clearly see the improvement over the iterations, and by 80,000 iterations some of the images produced are not discernible from the real dataset.

The full code can be found on my Github page.