If $Z$ denotes a set of attributes such that for any $Z$, $f(Z)$ defines a family of functions, then you've basically stumbled on to hyperparameter search. As an example, consider a neural network with parameters $\theta$. We can reformulate your mapping $ f: X \mapsto Y$ as trying to learn the function $f(X;\theta) = Y$. But we have a lot of structural decisions to make here. What's our cost function? How many layers? What dimensions? What types of layers and activations? Do we want to have recurrence or convultions? Dropout? Normalization? Regularization? What learning rate? Should we use momentum? An annealing schedule? How many epochs? How much coffee should I drink before I start coding?
If we wrap all of those structural decisions into $Z$, then $f(Z)$ is a specification for a family of functions for whom we are trying to learn the parameters $\theta$ that minimizes error.
Hyperparameter search is a big problem and there's a lot of interesting research into the topic. I think the current SOTA is bayesian optimization with bandit algorithms, although simple approaches like grid search, random search, and even guess-and-check are probably the most common.