0

I have the following code for time series predictions with RNNs and I would like to know whether for the testing I predict one day in advance:

# -*- coding: utf-8 -*-
"""
Time Series Prediction with  RNN

""" import pandas as pd import numpy as np from tensorflow import keras

#%% Configure parameters

epochs = 5 batch_size = 50

steps_backwards = int(1* 4 * 24) steps_forward = int(1* 4 * 24)

split_fraction_trainingData = 0.70 split_fraction_validatinData = 0.90

#%% "Reading the data"

dataset = pd.read_csv('C:/User1/Desktop/TestValues.csv', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0]}, index_col=['datetime'])

df = dataset data = df.values indexWithYLabelsInData = 0 data_X = data[:, 0:2] data_Y = data[:, indexWithYLabelsInData].reshape(-1, 1)

#%% Prepare the input data for the RNN

series_reshaped_X = np.array([data_X[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))]) series_reshaped_Y = np.array([data_Y[i:i + (steps_backwards+steps_forward)].copy() for i in range(len(data) - (steps_backwards+steps_forward))])

timeslot_x_train_end = int(len(series_reshaped_X)* split_fraction_trainingData) timeslot_x_valid_end = int(len(series_reshaped_X)* split_fraction_validatinData)

X_train = series_reshaped_X[:timeslot_x_train_end, :steps_backwards] X_valid = series_reshaped_X[timeslot_x_train_end:timeslot_x_valid_end, :steps_backwards] X_test = series_reshaped_X[timeslot_x_valid_end:, :steps_backwards]

indexWithYLabelsInSeriesReshapedY = 0 lengthOfTheYData = len(data_Y)-steps_backwards -steps_forward Y = np.empty((lengthOfTheYData, steps_backwards, steps_forward))
for step_ahead in range(1, steps_forward + 1):
Y[..., step_ahead - 1] = series_reshaped_Y[..., step_ahead:step_ahead + steps_backwards, indexWithYLabelsInSeriesReshapedY]

Y_train = Y[:timeslot_x_train_end] Y_valid = Y[timeslot_x_train_end:timeslot_x_valid_end] Y_test = Y[timeslot_x_valid_end:]

#%% Build the model and train it

model = keras.models.Sequential([ keras.layers.SimpleRNN(90, return_sequences=True, input_shape=[None, 2]), keras.layers.SimpleRNN(60, return_sequences=True), keras.layers.TimeDistributed(keras.layers.Dense(steps_forward)) #keras.layers.Dense(steps_forward) ])

model.compile(loss="mean_squared_error", optimizer="adam", metrics=['mean_absolute_percentage_error']) history = model.fit(X_train, Y_train, epochs=epochs, batch_size=batch_size, validation_data=(X_valid, Y_valid))

#%% #Predict the test data Y_pred = model.predict(X_test)

prediction_lastValues_list=[]

for i in range (0, len(Y_pred)): prediction_lastValues_list.append((Y_pred[i][0][steps_forward-1]))

#%% Create thw dataframe for the whole data

wholeDataFrameWithPrediciton = pd.DataFrame((X_test[:,0])) wholeDataFrameWithPrediciton.rename(columns = {indexWithYLabelsInData:'actual'}, inplace = True) wholeDataFrameWithPrediciton.rename(columns = {1:'Feature 1'}, inplace = True) wholeDataFrameWithPrediciton['predictions'] = prediction_lastValues_list wholeDataFrameWithPrediciton['difference'] = (wholeDataFrameWithPrediciton['predictions'] - wholeDataFrameWithPrediciton['actual']).abs() wholeDataFrameWithPrediciton['difference_percentage'] = ((wholeDataFrameWithPrediciton['difference'])/(wholeDataFrameWithPrediciton['actual']))*100

So I define eps_forward = int(1* 4 * 24) which is basically one full day (in 15 minutes resolution which makes 1 * 4 *24 = 96 time stamps). I predict the test data by using Y_pred = model.predict(X_test) and I create a list with the predicted values by using for i in range (0, len(Y_pred)): prediction_lastValues_list.append((Y_pred[i][0][steps_forward-1]))

As for me the input and output data of RNNs is quite confusing I am not sure whether for the test dataset I predict one day in advance meaning 96 time steps into the future. Actually what I want is to read historic data and then predict the next 96 time steps based on the historic 96 time steps. Can anyone of you tell me whether I am doing this by using this code or not?

Here I have a link to some test data that I just created randomly. Do not care about the actual values but just on the structure of the prediction: Download Test Data

Reminder: My bountry is expiring soon and I have not received an answer to my basic question so far. I have uploaded a minimal reproducible example and even some test data. So I'd be quite happy if you could answer my basic question on whether I am forecasting 96 steps in advance with the given code. I'll highly appreciate it. If you need some further information, you can tell me.

PeterBe
  • 83
  • 2
  • 11

2 Answers2

1

Usually with NN you would use LSTM layers to deal with time. Time steps can be a little confusing with TF/Keras. However, there is a great tutorial using the Jena data. Maybe this helps: https://blogs.rstudio.com/ai/posts/2017-12-20-time-series-forecasting-with-recurrent-neural-networks/

Peter
  • 7,896
  • 5
  • 23
  • 50
0

Your code implements timeseries split from scratch. Implementing from scratch has the potential to introduce subtle bugs. Another option would be to use an established package. Examples include scikit-learn's TimeSeriesSplit and Keras' TimeseriesGenerator.

Brian Spiering
  • 23,131
  • 2
  • 29
  • 113