How to design batches in a stateful RNN

Question

I am using TF Eager to train a stateful RNN (GRU).

I have several variable length time sequences about 1 minute long which I split into windows of length 1s.

In TF Eager, like in Keras, if stateful=True, "the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch." (source)

Thus, how should I design my batches? I obviously cant sample random windows from random sequences. I also cant split a sequence into windows and place adjacent windows in the same batch (e.g. batch 1 = [[seq1 0-1s], [seq 1 1-2s], [seq1 2-3s], ...]), as the state from the previous window wont get passed to the next window, as is the point of a stateful RNN.

I was thinking of mixing sequences in the same batch as in:

batch 1 = [[seq1 0-1s], [seq2 0-1s], [seq3 0-1s], ...]
batch 2 = [[seq1 1-2s], [seq2 1-2s], [seq3 1-2s], ...]
...

However, there the issue is that the sequences have different length, and thus some will finish before others.

So what is the best way to implement this?

(FYI, I couldn't find anything in the academic literature or blogoshhere which discusses this, so refs would be great)

Thanks!

Esmailian · Accepted Answer · 2019-05-03T10:00:27.920

Your specific case

After [seq1 0-1s] (1st sec of long sequence seq1) at index 0 of batch b, there is [seq1 1-2s] (2nd sec of the same sequence seq1) at index 0 of batch b+1, this is exactly what is required when we set stateful=True.

Note that the samples inside each batch must be the same length, if this is done correctly, difference in sequence-length between (not inside) batches causes no problem. That is, when all samples from batch b are processed, then next batch b+1 will be processed, and so on and so forth.

A general example

As a general example, for stateful=True and batch_size=2, a data set like

seq1: s11, s12, s13, s14
seq2: s21, s22, s23
seq3: s31, s32, s33, s34, s35
seq4: s41, s42, s43, s44, s45, s46

where sij denotes j-th time step, must be structured like

    batch 1         batch 2         batch 3         batch 4  

0   s21, s22        s23, <pad>      s31, s32, s33   s34, s35, <pad>   ...
1   s11, s12        s13, s14        s41, s42, s43   s44, s45, s46

or like (with overlap)

    batch 1         batch 2         batch 3         

0   s21, s22        s22, s23        s23, <pad>    ...
1   s11, s12        s12, s13        s13, s14

where, for example, long sequence s21, s22, s23 (3 time steps) is broken down to two sub-sequences s21, s22 and s23, <pad>. Also, as you see, it is possible to have batches with different sequence lengths (by using a custom batch generator).

Note that <pad> (padded values) should be masked to prevent RNN from considering them as actual values (more info in this post). We can also avoid using padded values by opting for batch_size=1 which might be too restrictive (more info in this post).

Here are two examples of a sequence with 5 time steps:

            s11   s12   s13   s14   s15

example 1   23,   25,   27,   24,    28     # 5 temperature readings for t to t+4

example 2   I,    like, LSTM, very,  much   # 5 128-dim word embeddings

Some resources

You may find this article on stateful vs stateless LSTM helpful. Some quotes from the article:

The stateless LSTM with the same configuration may perform better on this problem than the stateful version.

and

When a large batch size is used, a stateful LSTM can be simulated with a stateless LSTM.
This blog on Stateful LSTM in Keras by Philippe Remy
Some opinions on Keras github, like this.

How to design batches in a stateful RNN

1 Answers1

Your specific case

A general example

Some resources

Linked