1

let's suppose we have a customer data from the year 2015 to 2019, I want to train_test_split() my data such that my data gets divided into three sets, set-1 is from 2015 to 2017 (3 years) on which i will train my model, set-2 i.e. 2018(1 year) on which i will validate my model , set 3 is 2019(1 year) on which I will test my model. I want a code to divide data into 3 sets based on time(years).

karan
  • 13
  • 3

1 Answers1

0

Seems to me the best (or at least quickest) way to do this would be have all the data in a Pandas dataframe, then create masks based on year and create new dataframes for each group. Ex:

train_df = data[data['year'].isin(['2015', '2016', '2017'])
validate_df = data[data['year'] == '2018']
test_df = data[data['year'] == '2019']

Hope this is what you're looking for. If not, let me know and we can work out another solution.

whege
  • 171
  • 4