3

I am doing credit risk modelling on costumer transaction data a part of which looks like this :

str(x)
'data.frame':   412516 obs. of  26 variables:
 $ Tenure           : num  1.26 1.25 1.26 1.31 1.32 ...
 $ Product          : Factor w/ 24 levels "BACKHOE LOADER",..: 4 4 4 9 9 9 9 9 9 9 ...
 $ Net.Exposure     : num  333339 528049 327335 350000 460000 ...
 $ OD.On.31.01.2017 : num  0 90386 0 0 1099692 ...
 $ LM.Bucket        : Ord.factor w/ 11 levels "0"<"1 TO  30"<..: 1 1 1 1 11 11 11 11 11 11 ...
 $ Bucket           : Ord.factor w/ 11 levels "0"<"1 TO  30"<..: 1 3 1 1 11 11 11 11 11 11 ...
 $ Billing          : num  65380 0 8800 6339 8331 ...
 $ Fin.IRR          : num  13.5 14.6 14.6 18.1 23.3 ...
 $ NPA.Flag         : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 2 2 2 2 2 ...
 $ Inst.Due         : num  0 0.85 0 0 3 3 3 3 3 3 ...
 $ FR.On.31.01.2017 : num  65380 0 38940 35043 499860 ...
 $ POS.On.31.01.2017: num  56453 0 32920 33368 293943 ...
 $ Del.String       : int  2 1 1 1 53720 53720 53720 53720 53720 53720 ...
 $ Territory        : Factor w/ 43 levels "AGRA","AHMEDABAD",..: 41 41 41 41 41 41 41 41 41 41 ...

The variables like OD(Overdue) and LM.Bucket( How many months he has been due on his loan payment till last month) change every month .I have 2 tasks :Predict Bucket and NPA Flag(Non performing asset)

I built a model for this based only on the Jan data(x). But my question is since these variables change every month, should i treat this as a sequential data and build a deep learning model(HMM/NN) on it? If i should what should I do with the static variables like Product type etc.?

I asked my boss regarding the same and he said it shouldn't be done because external economy factor change with time. Is that a reason for concern?

Grasshopper
  • 143
  • 5
Dhruv Mahajan
  • 378
  • 1
  • 11

2 Answers2

1

This basically asks for a recurrent network, like the LSTM. But if you only have 2 properties that are dynamic, I don't think you will have as much luck because they might be affected from external factors as your boss said. However, this will happen regardless of the model you're using.

You should not throw away static properties, unless they are the same for every test case. E.g. farmer/politician/baker category should always be included, but you call this 'static' but it's not completely static as it is not the same for every sample in your test cases.

Thomas Wagenaar
  • 1,158
  • 8
  • 7
0

The objective with supervised learning is to try to create a model of your data that helps you predict future values. You do that by selecting features on your data set - what you call variables - that you believe represent well the problem.

I'm no expert but I understand economy is affected by an enormous number of variables, so even if you create a model that fits the data you currently have based on some of those variables, it might become obsolete the moment variables you did not consider start affecting the end result. That's what I believe your boss was talking about.

Now, if you do decide to train a neural network in order to predict Bucket and NPA your first step will be to choose which variables you'll consider in your model. Keeping the 'static' variables will likely make your network have different predictions for, for example, different Product types but that depends on how the data is distributed across this static variables. If you choose to not use the static variables your model will completely ignore them when making predictions, which might not be what you want.

Grasshopper
  • 143
  • 5