17

I have been trying to understand how to represent and shape data to make a multidimentional and multivariate time series forecast using Keras (or TensorFlow) but I am still very unclear after reading many blog posts/tutorials/documentation about how to present the data in the correct shape (most examples being of slightly less

My Dataset:

  • several cities
  • for which I have info about say temperature, car traffic, humidity
  • for say the last 2 years (one record for each day)

What I want to do: I'd like to forecast for each city the temperatures I can expect for the next year using a possibly lagged version of temperature, car traffic and humidity (of course there would be several more features but this is just an example for thought).

What I am confused about: If I have 2 cities, for which I recorded 3 features for 365 days. How should I shape my input so that the model can output a forecast for 365 days for these two cities (i.e. 2 time series of temperatures for 365 days)?

Intuitively the tensor shape would be (?, 365, 3) for 365 days and 3 features. But I'm not sure what to stick into the first dimension and, most importantly, I would be surprised if it had to be for the number of cities. But at the same time, I have no idea how to specify into the model that it has to understand the dimensions properly.

Any pointers will be helpful. I'm pretty familiar with the rest of the problem (i.e. how you build a network in Keras etc since I have done this for other neural networks but more specifically how best to encode the sequence for the desired input.)

Oh and also, I guess I could train and predict for each city independently, but I'm sure everyone will agree there are probably things to be learned that are not particular to any city but that can only be seen if considering several of them, hence why I think it is important to encode it in the model.

Bastien
  • 273
  • 1
  • 2
  • 6

1 Answers1

16

The input shape for an LSTM must be (num_samples, num_time_steps, num_features). In your example case, combining both cities as input, num_features will be 2x3=6.

If you lump all your 365 time steps into one sample, then the first dimension will be 1 - one single sample! You can also do sanity check by using the total number of data points. You have 2 cities, each with 365 time-steps and 3 features: 2x365x3= 2190 . This is obviously the same as 1x365x6 (as I said above) - so it would be a possibility (Keras will run) - but it obviously won't learn to generalise at all, only giving it one sample.

Have a look at this relevant question, which I recently answered. There I speak a little about using a rolling window (check the comments of the answer for more info). That will buy you more samples if you need them.

If you want to train a single model with data for both cities as input, then making predictions for both cities at each time-step is as simple as defining a final Dense layer, which outputs 2 units. Your validation/test data must then of course contain a tuple of (city1, city2).

A perhaps more sophisticated way to approach this would be to create datasets on a single-city basis, then train several sub-models on each city individually (say for 5 layers), then Merge/Concatenate them and put several further layers on top. This will mean you are combining the learnt features of each city, which are in turn being combined to a higher level of abstraction. Here is the first image I got from a search engine, which sketches the idea.

n1k31t4
  • 15,468
  • 2
  • 33
  • 52