2

I am using Keras for time series forecasting and I am trying to understand the tutorial on the offical site of keras about time series forecasting that you can find here (https://keras.io/examples/timeseries/timeseries_weather_forecasting/).

They use one keras-method called keras.preprocessing.timeseries_dataset_from_array and it has the following parameters (here is a documentation https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/timeseries_dataset_from_array):

    dataset_train = keras.preprocessing.timeseries_dataset_from_array(
    x_train,
    y_train,
    sequence_length=sequence_length,
    sampling_rate=step,
    batch_size=batch_size,
)

So my question is what is the difference between the sequence length and the batch size. I think the sequence is the size of the sliding window (x-features and one target y-value). But what is the batch size? Unfortunately I can't have a look at the output of this method as

   print(dataset_train) or 
   print(dataset_train.head()) 

does not show me the data and I do not know any other function how I could have a look at the output of the method.

Has anyone of you had expercience with this method or generally with sequences and batches? I'd appreciate every comment.

Valentin Calomme
  • 6,256
  • 3
  • 23
  • 54
PeterBe
  • 83
  • 2
  • 11

1 Answers1

3

Let's take a TS data = [ 1, 2, 3, 4, 5, 7, 8, 9, 10 ]
Call the function with these parameters
sequence_length=5, sampling_rate=1, sequence_stride=1, shuffle=False, batch_size=2

shuffle, batch_size has no role in TS data creation. It will come into effect when you iterate on the returned Dataset.

In this case, we will have the following data points,
[ 1, 2, 3, 4, 5 ]
[ 2, 3, 4, 5, 6 ]
[ 3, 4, 5, 6, 7 ]
[ 4, 5, 6, 7, 8 ]
[ 5, 6, 7, 8, 9 ]
[ 6, 7, 8, 9, 10 ]

batch_size
When you will iterate on this dataset, you will receive 2 records in each iteration.
If shuffle=True, records will be shuffled before batching.

for batch in dataset:
  inputs, targets = batch

In the above snippet, inputs will be a batch of records, not just one record. You may have the batch_size=1 if required.

targets

Targets corresponding to timesteps in data. It should have same length as data. targets[i] should be the target corresponding to the window that starts at index i (see example 2 below). Pass None if you don't have target data (in this case the dataset will only yield the input data)

This is a general-purpose function.
It is not deciding the Target on some logic i.e. Autoregressive approach. It expects that the targets will be provided, otherwise, it will just return the Predictors.

10xAI
  • 5,929
  • 2
  • 9
  • 25