Resources for studying the mathematical foundations of machine learning, for someone from a math/physics background

Question

I am a soon-to-be physics graduate student with a background in theoretical and experimental cosmology. In my work, I've often found myself applying machine learning models and techniques for the purposes of performing data analysis tasks. I, as many do, find the field to be fascinating, but have little more than a superficial, very applied understanding. I know how to throw a simple neural network together with TensorFlow & Keras (and understand the mechanics of how the actual computations are performed), but have little understanding of why a neural network is able to achieve what it does, why deep neural networks perform better than their shallow counterparts, why Google tells me that certain convolutional kernel sizes are more likely to perform well, etc. in the way that I am used to in mathematics and physical science. In these fields (and this is, of course, a gross generalization) there is an expectation of proof and some more intuitive level of understanding.

I've tried studying from various online courses and textbooks, and all take a very applied, engineering-y approach that really bothers me. I'm aware that machine learning is, in fact, largely an engineering field, and that there is a lot we simply don't understand yet. I would assume, however, that there must be at least some very, more math-y elementary understanding of the field.

It was previously suggested to me that I should look into the field of "Statistical Learning". I purchased Hastie and Tibshirani's text, and have nearly completed the companion online course. I found it to be more satisfactory in that they provide a statistician's mathematical perspective, and oftentimes I could search around about the topics discussed to learn a bit more about why they work in the way that they do, and sometimes even find proofs for certain claims.

I'm looking for additional resources that attempt to explain the why and how behind machine learning, that might be more appealing to someone from a math and physics background. I am specifically interested in neural networks and their training. I know that this is a bit vague and, due perhaps in large part to my limited understanding of the field, I don't know what precisely I'm really looking for. There is a plethora of textbooks and online learning content available for those interested in machine learning, but seemingly few that discuss it at a depth I would find satisfactory. What textbooks/online courses/etc. might I use to move on from here?

D.W. · Answer 1 · 2024-01-08T18:59:31.660

Your assumptions may be a little bit off. Deep learning is largely an engineering field, and it is a young and rapidly moving one. Most "why" questions don't have very good answers. To the extent that there are partial answers to the "why" questions, mostly they are found in research papers, and I'm not sure whether they've made it into any textbooks yet. There are a lot of papers on the theory of deep learning, but (in my personal opinion) they mostly haven't been very successful on shedding light on the practical engineering practice of deep learning used in the field. If you come from a mathematics background, you might find it unsatisfying.

Generally, those who come from the field of statistics tend to have more of an emphasis on "why?" and principled foundations and mathematics and provable guarantees, which you might find more satisfying. Those who come from the field of machine learning often tend to have more of an emphasis on engineering and whatever works. See, e.g., Statistical Modeling: The Two Cultures (by Leo Breiman) and https://stats.stackexchange.com/q/6/2921. So I agree that you might find it helpful to study statistics. For instance, you might enjoy learning about logistic regression, which has solid theory underpinning it.

But if you want to study neural networks, be prepared that neural networks came from the ML culture, not the statistics culture, so you might not find what you are wishing for.

I would still suggest that you read standard textbooks and resources on deep learning. Start with the fundamental concepts and ideas everyone else is learning. Then you can search for theoretical explanations where they exist. For instance, I've heard a lot of good feedback on Deep Learning by Goodfellow, Bengio, and Courville, and that has some mathematical and theoretical background.

darwinflinches · Answer 2 · 2024-01-08T17:32:28.530

I was in an ML talk recently, where the speaker talked about how theory of ML has completely failed to explain much of anything that we do in practice.

Take, as one simple example, stochastic gradient descent. Would sampling with or without replacement converge faster? Empirically, the community has known that without replacement sampling outperforms with replacement sampling for many years. Yet this hunch (with standard assumptions) was only proved in 2019. This should serve as an example to demonstrate that theory lags behind engineering.

With that said, if you're interested in theory of machine learning, I recommend the book Understanding Machine Learning as it has a focus on mathematical rigor, and is a good resource for PAC learning.

score 0 · Answer 3 · answered Jan 09 '24 at 00:17

0

The resource that helped me the most in understanding the math behind deep learning is the excellent book "Neural networks and deep learning" by Michael Nielsen, which is available online for free (legally).

answered Jan 09 '24 at 00:17

Andrew Kelley

211
1
7

Resources for studying the mathematical foundations of machine learning, for someone from a math/physics background

3 Answers3