3

I am a newbie in the ML field. So please, neglect or better correct, if I am wrong somewhere. Currently working on a model training for time series data. My problem is a little more specific to bike-sharing. I have a count of bike-sharing for each area and for each bike type(gear, without gear...) for each day.

For example, Data:

Date    BikeType    Area    BikeCount
1/1/19  Gear        1        10
1/1/19  WithoutGear 1        15
1/1/19  Gear        2        8
1/1/19  WithoutGear 2        12
2/1/19  Gear        1        11
2/1/19  WithoutGear 1        17
2/1/19  Gear        2        9
2/1/19  WithoutGear 2        16

So, I will have a trend for each type of bike for each area. How to use time series for this data. I have to predict the bike required for each Type and each Area. For example, for the given data I have to predict the count of Geared bike required and WithoutGeared bike required for both 1 and 2 areas on 3/1/19. (Considering two dates' data is enough for prediction, I have 2 years data for each specific area and type, they have a good trend)

And the second question is... Currently, I have to only two dimension bike types and area, they may increase later(like their color and their condition) how to handle this. Any contribution will be helpful.

I get a hint from the question: https://stackoverflow.com/questions/55545501/how-to-perform-time-series-analysis-that-contains-multiple-groups-in-python-usin

But, is this the best any only way...

Thank you (I need suggestion for the question title also)

Edit:

I get a similar problem in the following references:

Multi-dimentional and multivariate Time-Series forecast (RNN/LSTM) Keras

Multivariate and multi-series LSTM

Now, I have two more doubts:

  1. Is the LSTM is the only way?

  2. Are my data columns (data types and area) dimensions or features.

2 Answers2

1

I think you should look into multivariate regression. You could use these variables (type, area etc) along with other factors such as seasonality. Create dummy variables such as date of the week, week number , month etc to capture seasonality. For example rainy month may have less bike demand. These are inherent features from the data.

Anurag
  • 111
  • 1
1

I will go through your question one by one:

How to use time series for this data

You can train an RNN multivariate regressor, by feeding time series of your variables. Your first layer would be recurrent (LSTM or GRU), and provided with the following input_shape:

( batch size , input size , Number of variables )


I have to only two dimension bike types and area, they may increase later(like their color and their condition) how to handle this

If you need to add a new variable that didn't exist before, I'm afraid you'll have to re-train your model. A different architecture = different set of weights to be trained.


Is the LSTM is the only way?

No, you can use recurrent layers with GRU cells. GRUs are different from LSTM: they have less parameters (i.e. less powerful) but are faster to train. There is no right or wrong choice, I think it's worth to test both architectures and see which performs better on the current task.


Are my data columns (data types and area) dimensions or features

Please rephrase this question, it's not clear what you mean. What do you mean with "are [data types and area] dimensions or features"?

All the variables displayed above can be used as input variables, if you pre-process them accordingly. For example, datetime information could be turned into more numerical variable to capture seasonality trends (this is just an example). Bike type and Area could be one-hot encoded, or other related information could be joined based on their values... there's plenty of choice.

Leevo
  • 6,445
  • 3
  • 18
  • 52