5

I would like to know whether it's possible to build a predictive model where I could define a set of rows with their attributes, and a class belonging to that set of rows, instead of having the typical model one observation - one class.

What I'm trying to do is to create a model which should be able to predict the future level of a rider in professional cycling - more likely to be used to forecast young riders' level . The input data are the results of each rider (race,category of race, type of parcours, position, and so on). The class is represented with the value (from 1 to 10) each "known" rider has been assigned for each skill (mountain, flat, time trial, sprint, etc.), which means that in principle, I would make one predictive model for each skill (what means for each class). For example, for mountain class I would have the following:

Rider   Category    Race Type   Parcours    Distance    Position    Class(Mountain)
-----------------------------------------------------------------------------
Froome  GrandTour   Stage       Mountain    185         1           10
Froome  WorldTour   Stage       MidMountain 210         25          10
Nibali  Contin.Tour Stage       Mountain    170         15          9
Nibali  WorldTour   Classic     MidMountain 245         3           9

I see here two main challenges, apart from the fact of dealing with both numerical and categorical attributes.

  1. A rider gathers a lot of results, and each of them will result in the same class value. For example, a victory in a Grand Tour category stage from Chris Froome would be rated with a 10 in mountain, but he would also be rated with a 10 in mountain when he ended up in a bad position in a lower level stage - just as in the example. Ok, it's supposed that with thousands of rows the model (regression model probably, rather than a classification algorithm, but not sure of that either) should be able to detect the "common" pattern (a good performance in mountain from a good climber like Froome), so it could be overcome. I guess it'd be a matter of testing and finding the best-fitted models.

  2. I find the predictive part the most problematic one, because what I need isn't to predict the class value for each observation (which is represented by each result), but the class value for each rider. That's why I said at the beginning of the thread that I would need to somehow treat all the results of a rider as a unique observation (something like a set or group of observations), and assign the class value to that set. A workaround could be to take all the observations of the rider, predict the class for each of them, and then calculate the average, but it doesn't look very smart.

So, any idea of how to build this kind of models?

Thanks in advance.

IgorS
  • 5,474
  • 11
  • 34
  • 43
Hibai
  • 51
  • 2

1 Answers1

1

Looks like you have two problems: time-dependant predictive modeling and feature engineering.

1) Time-dependant data

Key words:

create a model which should be able to predict the future level of a rider in professional cycling

The future Level of the Rider.

That means there is current Level. And Level in the past. And the history of Level change in time for each Rider.

The problem you are trying to solve is time dependant. Your source data can look like:

enter image description here

2) Build a variable to predict

Second. You should build the target feature. This Rider Level (Class).

As far as I understood:

  • the overall rider 'level' can be a function of his 'classes' in each particular race types
  • and... the race 'observation' class is a kind of function of {Category, Race Type, Parcours, Distance, Position}

So the only thing you realy need to predict is race result.

enter image description here

IgorS
  • 5,474
  • 11
  • 34
  • 43