A concrete lstm example in keras

Understanding recurrent neural networks

This notebook is based on code samples found in Chapter 6, Section 2 of Deep Learning with Python and hosted on https://github.com/fchollet/deep-learning-with-python-notebooks.

Note that the original text features far more content, in particular further explanations and figures.

import keras
keras.__version__
Using TensorFlow backend.
'2.2.4'
from keras import backend as K
K.tensorflow_backend._get_available_gpus()
['/job:localhost/replica:0/task:0/device:GPU:0']

A first recurrent layer in Keras

from keras.layers import SimpleRNN

Return either full sequences of successive outputs for each timestep (a 3D tensor of shape (batch_size, timesteps, output_features)).
Return only the last output for each input sequence (a 2D tensor of shape (batch_size, output_features)).

from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN

model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32))
model.summary()
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 32)                2080      
=================================================================
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
_________________________________________________________________
model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32, return_sequences=True))
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_2 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_2 (SimpleRNN)     (None, None, 32)          2080      
=================================================================
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
_________________________________________________________________

It is sometimes useful to stack several recurrent layers one after the other in order to increase the representational power of a network. In such a setup, you have to get all intermediate layers to return full sequences:

model = Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32))  # This last layer only returns the last outputs.
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_3 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_3 (SimpleRNN)     (None, None, 32)          2080      
_________________________________________________________________
simple_rnn_4 (SimpleRNN)     (None, None, 32)          2080      
_________________________________________________________________
simple_rnn_5 (SimpleRNN)     (None, None, 32)          2080      
_________________________________________________________________
simple_rnn_6 (SimpleRNN)     (None, 32)                2080      
=================================================================
Total params: 328,320
Trainable params: 328,320
Non-trainable params: 0
_________________________________________________________________

Now let's try to use such a model on the IMDB movie review classification problem. First, let's preprocess the data:

from keras.datasets import imdb
from keras.preprocessing import sequence

max_features = 10000  # number of words to consider as features
maxlen = 500  # cut texts after this number of words (among top max_features most common words)
batch_size = 32

print('Loading data...')
(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words=max_features)
print(len(input_train), 'train sequences')
print(len(input_test), 'test sequences')

print('Pad sequences (samples x time)')
input_train = sequence.pad_sequences(input_train, maxlen=maxlen)
input_test = sequence.pad_sequences(input_test, maxlen=maxlen)
print('input_train shape:', input_train.shape)
print('input_test shape:', input_test.shape)
Loading data...
Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz
17465344/17464789 [==============================] - 1s 0us/step
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
input_train shape: (25000, 500)
input_test shape: (25000, 500)

from keras.layers import Dense

model = Sequential()
model.add(Embedding(max_features, 32))
model.add(SimpleRNN(32))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
history = model.fit(input_train, y_train,
                    epochs=10,
                    batch_size=128,
                    validation_split=0.2)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Train on 20000 samples, validate on 5000 samples
Epoch 1/10
20000/20000 [==============================] - 36s 2ms/step - loss: 0.6427 - acc: 0.6129 - val_loss: 0.5030 - val_acc: 0.7752
Epoch 2/10
20000/20000 [==============================] - 36s 2ms/step - loss: 0.4117 - acc: 0.8223 - val_loss: 0.4154 - val_acc: 0.8248
Epoch 3/10
20000/20000 [==============================] - 35s 2ms/step - loss: 0.3038 - acc: 0.8791 - val_loss: 0.3661 - val_acc: 0.8440
Epoch 4/10
20000/20000 [==============================] - 35s 2ms/step - loss: 0.2358 - acc: 0.9081 - val_loss: 0.3610 - val_acc: 0.8504
Epoch 5/10
20000/20000 [==============================] - 35s 2ms/step - loss: 0.1733 - acc: 0.9378 - val_loss: 0.4402 - val_acc: 0.8472
Epoch 6/10
20000/20000 [==============================] - 35s 2ms/step - loss: 0.1126 - acc: 0.9611 - val_loss: 0.3822 - val_acc: 0.8674
Epoch 7/10
20000/20000 [==============================] - 35s 2ms/step - loss: 0.0744 - acc: 0.9756 - val_loss: 0.4660 - val_acc: 0.8580
Epoch 8/10
20000/20000 [==============================] - 35s 2ms/step - loss: 0.0462 - acc: 0.9860 - val_loss: 0.5900 - val_acc: 0.8092
Epoch 9/10
20000/20000 [==============================] - 36s 2ms/step - loss: 0.0289 - acc: 0.9921 - val_loss: 0.5860 - val_acc: 0.8254
Epoch 10/10
20000/20000 [==============================] - 35s 2ms/step - loss: 0.0297 - acc: 0.9918 - val_loss: 0.7748 - val_acc: 0.7616

%matplotlib inline
import matplotlib.pyplot as plt

acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.subplot(121)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.subplot(122)

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()

A concrete LSTM example in Keras

SimpleRNN is not very good at capturing long-distance dependencies. Both LSTM and GRU are explicitly designed to handle this problem.

Below we will repeat the same model using LSTM, using Keras' defaults for most options.

from keras.layers import LSTM

model = Sequential()
model.add(Embedding(max_features, 32))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])
history = model.fit(input_train, y_train,
                    epochs=10,
                    batch_size=128,
                    validation_split=0.2)
Train on 20000 samples, validate on 5000 samples
Epoch 1/10
20000/20000 [==============================] - 131s 7ms/step - loss: 0.5087 - acc: 0.7612 - val_loss: 0.3463 - val_acc: 0.8710
Epoch 2/10
20000/20000 [==============================] - 130s 7ms/step - loss: 0.2930 - acc: 0.8851 - val_loss: 0.3125 - val_acc: 0.8674
Epoch 3/10
20000/20000 [==============================] - 130s 6ms/step - loss: 0.2324 - acc: 0.9118 - val_loss: 0.2829 - val_acc: 0.8890
Epoch 4/10
20000/20000 [==============================] - 130s 7ms/step - loss: 0.1970 - acc: 0.9268 - val_loss: 0.4359 - val_acc: 0.8736
Epoch 5/10
20000/20000 [==============================] - 129s 6ms/step - loss: 0.1743 - acc: 0.9369 - val_loss: 0.3016 - val_acc: 0.8806
Epoch 6/10
20000/20000 [==============================] - 129s 6ms/step - loss: 0.1564 - acc: 0.9425 - val_loss: 0.4792 - val_acc: 0.8412
Epoch 7/10
20000/20000 [==============================] - 130s 6ms/step - loss: 0.1386 - acc: 0.9497 - val_loss: 0.7660 - val_acc: 0.8094
Epoch 8/10
20000/20000 [==============================] - 129s 6ms/step - loss: 0.1291 - acc: 0.9554 - val_loss: 0.3225 - val_acc: 0.8768
Epoch 9/10
20000/20000 [==============================] - 130s 6ms/step - loss: 0.1195 - acc: 0.9584 - val_loss: 0.3422 - val_acc: 0.8840
Epoch 10/10
20000/20000 [==============================] - 130s 6ms/step - loss: 0.1139 - acc: 0.9597 - val_loss: 0.4524 - val_acc: 0.8708
%matplotlib inline
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.subplot(121)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.subplot(122)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()