4

A few years ago, I understood the classical MLP neural network much better when I wrote an implementation from scratch (using only Python + Numpy, without using tensorflow). Now I'd like to do the same for recurrent neural networks.

For a standard MLP NN with dense layers, the forward-propagation can be summarized by:

def predict(x0):
    x = x0
    for i in range(numlayers-1):
        y = dot(W[i], x) + B[i]     # W[i] is a weight matrix, B[i] the biases 
        x = activation[i](y)
    return x

For a given single layer, the idea is just:

output_vector = activation(W[i] * input_vector + B[i])

What's the equivalent for a simple RNN layer, eg. SimpleRNN ?


More precisely, let's take an example of a RNN layer like this:
Input shape: (None, 250, 32)
Output shape: (None, 100)
Given an input x of shape (250, 32), with which pseudo-code can I generate the output y of shape (100,), of course by using weights, etc.?

Basj
  • 180
  • 2
  • 18

2 Answers2

1

Simple RNN Cells follow this pattern:

Given the following data:
    input data:         X
    weights:            wx
    recursive weights:  wRec

Initialize initial hidden state to 0

For each state, one by one:
    Update new hidden state as: (Input data * weights) + (Hidden state + recursive weights)

In Python code:

def compute_states(X, wx, wRec):
    """
    Unfold the network and compute all state activations 
    given the input X, input weights (wx), and recursive weights 
    (wRec). Return the state activations in a matrix, the last 
    column S[:,-1] contains the final activations.
    """
    # Initialise a matrix that holds all states for all input sequences.
    # The initial state s_0 is set to 0, each of the others will depend from the previous.
    S = np.zeros((X.shape[0], X.shape[1]+1))

    # Compute each state k from the previous state ( S[:,k] ) and current input ( X[:,k] ), 
    # by use of the input weights (wx) and recursive weights (wRec).
    for k in range(0, X.shape[1]):
        S[:,k+1] = (X[:,k] * wx) + (S[:,k] * wRec)

    return S

This is a slightly more clear version of the code I found here.

Is this helpful for you?

Leevo
  • 6,445
  • 3
  • 18
  • 52
0

In other words, what does the forward pass of a RNN look like. You read about using the inputs plus values from the previous node (here it will be prev_s) First initialise the weights, than perform the foreward pass. I highlighted what you was looking for.

U = np.random.uniform(0, 1, (hidden_dim, T))
W = np.random.uniform(0, 1, (hidden_dim, hidden_dim))
V = np.random.uniform(0, 1, (output_dim, hidden_dim))


 for i in range(Y.shape[0]):
        x, y = X[i], Y[i]

        layers = []
        prev_s = np.zeros((hidden_dim, 1))
        dU = np.zeros(U.shape)
        dV = np.zeros(V.shape)
        dW = np.zeros(W.shape)

        dU_t = np.zeros(U.shape)
        dV_t = np.zeros(V.shape)
        dW_t = np.zeros(W.shape)

        dU_i = np.zeros(U.shape)
        dW_i = np.zeros(W.shape)

        # forward pass
        for t in range(T):
            new_input = np.zeros(x.shape)
            new_input[t] = x[t]
            mulu = np.dot(U, new_input)
            mulw = np.dot(W, prev_s)
            add = mulw + mulu
            s = sigmoid(add)
            ***mulv = np.dot(V, s)***
            layers.append({'s':s, 'prev_s':prev_s})
            prev_s = s

So the '* *' area can be roughly translated: mulv = np.dot(V, s) are the weights multiplied with the current state. (same as before, s==input_vector) but the difference is that the s will be calculated with weights from previous output and current input i.e.

mulu = np.dot(U, new_input)
mulw = np.dot(W, prev_s)
add = mulw + mulu
s = sigmoid(add)

Thats why we have 3 initial weights in the first place.

Noah Weber
  • 5,829
  • 1
  • 13
  • 26