37

I have a bit of self taught knowledge working with Machine Learning algorithms (the basic Random Forest and Linear Regression type stuff). I decided to branch out and begin learning RNN's with Keras. When looking at most of the examples, which usually involve stock predictions, I haven't been able to find any basic examples of multiple features being implemented other than 1 column being the feature date and the other being the output. Is there a key fundamental thing I'm missing or something?

If anyone has an example I would greatly appreciate it.

Thanks!

Ethan
  • 1,657
  • 9
  • 25
  • 39
Rjay155
  • 1,235
  • 2
  • 12
  • 9

1 Answers1

32

Recurrent neural networks (RNNs) are designed to learn sequence data. As you guess, they can definitely take multiple features as input! Keras' RNNs take 2D inputs (T, F) of timesteps T and features F (I'm ignoring the batch dimension here).

However, you don't always need or want the intermediate timesteps, t = 1, 2 ... (T - 1). Therefore, Keras flexibly supports both modes. To have it output all T timesteps, pass return_sequences=True to your RNN (e.g., LSTM or GRU) at construction. If you only want the last timestep t = T, then use return_sequences=False (this is the default if you don't pass return_sequences to the constructor).

Below are examples of both of these modes.

Example 1: Learning the sequence

Here's a quick example of training a LSTM (type of RNN) which keeps the entire sequence around. In this example, each input data point has 2 timesteps, each with 3 features; the output data has 2 timesteps (because return_sequences=True), each with 4 data points (because that is the size I pass to LSTM).

import keras.layers as L
import keras.models as M

import numpy

# The inputs to the model.
# We will create two data points, just for the example.
data_x = numpy.array([
    # Datapoint 1
    [
        # Input features at timestep 1
        [1, 2, 3],
        # Input features at timestep 2
        [4, 5, 6]
    ],
    # Datapoint 2
    [
        # Features at timestep 1
        [7, 8, 9],
        # Features at timestep 2
        [10, 11, 12]
    ]
])

# The desired model outputs.
# We will create two data points, just for the example.
data_y = numpy.array([
    # Datapoint 1
    [
        # Target features at timestep 1
        [101, 102, 103, 104],
        # Target features at timestep 2
        [105, 106, 107, 108]
    ],
    # Datapoint 2
    [
        # Target features at timestep 1
        [201, 202, 203, 204],
        # Target features at timestep 2
        [205, 206, 207, 208]
    ]
])

# Each input data point has 2 timesteps, each with 3 features.
# So the input shape (excluding batch_size) is (2, 3), which
# matches the shape of each data point in data_x above.
model_input = L.Input(shape=(2, 3))

# This RNN will return timesteps with 4 features each.
# Because return_sequences=True, it will output 2 timesteps, each
# with 4 features. So the output shape (excluding batch size) is
# (2, 4), which matches the shape of each data point in data_y above.
model_output = L.LSTM(4, return_sequences=True)(model_input)

# Create the model.
model = M.Model(input=model_input, output=model_output)

# You need to pick appropriate loss/optimizers for your problem.
# I'm just using these to make the example compile.
model.compile('sgd', 'mean_squared_error')

# Train
model.fit(data_x, data_y)

Example 2: Learning the last timestep

If, on the other hand, you want to train an LSTM which only outputs the last timestep in the sequence, then you need to set return_sequences=False (or just remove it from the constructor entirely, since False is the default). And then your output data (data_y in the example above) needs to be rearranged, since you only need to supply the last timestep. So in this second example, each input data point still has 2 timesteps, each with 3 features. The output data, however, is just a single vector for each data point, because we have flattened everything down to a single timestep. Each of these output vectors still has 4 features, though (because that is the size I pass to LSTM).

import keras.layers as L
import keras.models as M

import numpy

# The inputs to the model.
# We will create two data points, just for the example.
data_x = numpy.array([
    # Datapoint 1
    [
        # Input features at timestep 1
        [1, 2, 3],
        # Input features at timestep 2
        [4, 5, 6]
    ],
    # Datapoint 2
    [
        # Features at timestep 1
        [7, 8, 9],
        # Features at timestep 2
        [10, 11, 12]
    ]
])

# The desired model outputs.
# We will create two data points, just for the example.
data_y = numpy.array([
    # Datapoint 1
    # Target features at timestep 2
    [105, 106, 107, 108],
    # Datapoint 2
    # Target features at timestep 2
    [205, 206, 207, 208]
])

# Each input data point has 2 timesteps, each with 3 features.
# So the input shape (excluding batch_size) is (2, 3), which
# matches the shape of each data point in data_x above.
model_input = L.Input(shape=(2, 3))

# This RNN will return timesteps with 4 features each.
# Because return_sequences=False, it will output 2 timesteps, each
# with 4 features. So the output shape (excluding batch size) is
# (2, 4), which matches the shape of each data point in data_y above.
model_output = L.LSTM(4, return_sequences=False)(model_input)

# Create the model.
model = M.Model(input=model_input, output=model_output)

# You need to pick appropriate loss/optimizers for your problem.
# I'm just using these to make the example compile.
model.compile('sgd', 'mean_squared_error')

# Train
model.fit(data_x, data_y)
hxd1011
  • 103
  • 3
Adam Sypniewski
  • 1,106
  • 8
  • 6