1

I'm looking for some machine learning algorithm to train on data that are sampled from some submodular set function and I want the learned model predictions also obey submodularity. For example linear regression can do this (because it's linear hence it produce a model that obey both submodularity and supermodularity) but I want a model that is not linear and can capture some non-linear relationships in the data.
I played with SVR (support vector regression) with some kernels but could not prove anything useful.

2 Answers2

0

One possible approach is to train a model that is guaranteed, by the design of its architecture, to represent a submodular function. The following research paper looks like it might be helpful for that:

Deep submodular functions: Definitions and learning. Brian W. Dolhansky and Jeff A. Bilmes. NeurIPS 2016.


Another possible approach might be to learn a set function, and then add a regularization/penalty term to the loss function to penalize any violations of submodularity. There are various methods for learning a set function. One very simple one, if the universe that the set is contained within is small enough, is to simply apply a one-hot encoding of the set, then feed that feature vector to a normal network. If the universe is large, there are more sophisticated methods in the research literature, e.g.,

Deep Sets. Manzil Zaheer, Satwik Kottur, Siamak Ravanbhakhsh, Barnabás Póczos, Ruslan Salakhutdinov, Alexander J. Smola. NeurIPS 2017.

Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks. Juho Lee, Yoonho Lee, Jungtaek Kim, Adam R. Kosiorek, Seungjin Choi, Yee Whye Teh. Proceedings on Machine Learning Research.

Then, if $f_\theta(S)$ represents the value of the function on set $S$ (using parameters $\theta$), we can introduce a penalty term as follows:

$$\Psi(\theta) = \max_{S,T} \max(f_\theta(S \cup T) + f_\theta(S \cap T) - f_\theta(S) - f_\theta(T),0).$$

Now add $\Psi(\theta)$ to the loss function, and then train as usual, with this modified loss function. To make this work, you'll need to find a way to compute $\Psi(\theta)$, or at least to estimate it. I'm not immediately sure how to do that, and it might depend on the specific model you use.

D.W.
  • 167,959
  • 22
  • 232
  • 500
-1

If you're looking for a model that not only learns from submodular data but also ensures that its predictions preserve submodularity, there are a few interesting options to consider.

One approach is using neural networks with structural constraints. Some architectures incorporate constraints directly into the network structure to ensure that the learned function maintains certain properties, including submodularity. You might find "Deep Submodular Functions" (Dolhansky & Bilmes, NeurIPS 2016) useful in this context.

Another option is tackling the problem from the submodular optimization perspective. There are methods based on submodular function maximization (SFM) and projected gradient techniques that could be useful if you can properly reformulate your problem.

You also mentioned trying SVR with some kernels without much success. It might be worth exploring more advanced approaches, such as Gaussian Processes with kernels designed for submodular functions or Bayesian inference models that incorporate structural constraints.

If you can share more details about the type of data you're working with, it could help refine the strategy further. I'd love to hear more about your approach!