2

Typical algorithms involve learning and applying a single mapping e.g. $ f: X \mapsto Y$

Are there any algorithms that learn multiple mappings given an extra variable e.g. $ f(Z): X \mapsto Y$ (This feels like an abuse of notation.).

Here we learn a mapping, given $Z$, that is then applied to $X$ to produce $Y$

One solution (although intractable) would be to train an algorithm (e.g. a neural network) whose output is then used as the weights of a second network.

Any suggestions would be much appreciated and feel free to ask for more details. Thanks.

Nikos M.
  • 2,493
  • 1
  • 7
  • 11
Daniel
  • 256
  • 1
  • 8

2 Answers2

1

This is equivalent to thinking of this as learning a function of two inputs, i.e., $g : (X,Z) \mapsto Y$ (i.e., $g(X,Z)=Y$). Any machine learning method for learning functions can be used for that; you just treat the feature vector (the input) as the concatenation of the vector $X$ and the vector $Z$, and then use standard ML techniques on the resulting instances. So, you don't need anything special or fancy to handle this -- you can use any existing machine learning method that you are comfortable with.

D.W.
  • 3,651
  • 18
  • 43
0

If $Z$ denotes a set of attributes such that for any $Z$, $f(Z)$ defines a family of functions, then you've basically stumbled on to hyperparameter search. As an example, consider a neural network with parameters $\theta$. We can reformulate your mapping $ f: X \mapsto Y$ as trying to learn the function $f(X;\theta) = Y$. But we have a lot of structural decisions to make here. What's our cost function? How many layers? What dimensions? What types of layers and activations? Do we want to have recurrence or convultions? Dropout? Normalization? Regularization? What learning rate? Should we use momentum? An annealing schedule? How many epochs? How much coffee should I drink before I start coding?

If we wrap all of those structural decisions into $Z$, then $f(Z)$ is a specification for a family of functions for whom we are trying to learn the parameters $\theta$ that minimizes error.

Hyperparameter search is a big problem and there's a lot of interesting research into the topic. I think the current SOTA is bayesian optimization with bandit algorithms, although simple approaches like grid search, random search, and even guess-and-check are probably the most common.

David Marx
  • 3,288
  • 16
  • 23