I am a soon-to-be physics graduate student with a background in theoretical and experimental cosmology. In my work, I've often found myself applying machine learning models and techniques for the purposes of performing data analysis tasks. I, as many do, find the field to be fascinating, but have little more than a superficial, very applied understanding. I know how to throw a simple neural network together with TensorFlow & Keras (and understand the mechanics of how the actual computations are performed), but have little understanding of why a neural network is able to achieve what it does, why deep neural networks perform better than their shallow counterparts, why Google tells me that certain convolutional kernel sizes are more likely to perform well, etc. in the way that I am used to in mathematics and physical science. In these fields (and this is, of course, a gross generalization) there is an expectation of proof and some more intuitive level of understanding.
I've tried studying from various online courses and textbooks, and all take a very applied, engineering-y approach that really bothers me. I'm aware that machine learning is, in fact, largely an engineering field, and that there is a lot we simply don't understand yet. I would assume, however, that there must be at least some very, more math-y elementary understanding of the field.
It was previously suggested to me that I should look into the field of "Statistical Learning". I purchased Hastie and Tibshirani's text, and have nearly completed the companion online course. I found it to be more satisfactory in that they provide a statistician's mathematical perspective, and oftentimes I could search around about the topics discussed to learn a bit more about why they work in the way that they do, and sometimes even find proofs for certain claims.
I'm looking for additional resources that attempt to explain the why and how behind machine learning, that might be more appealing to someone from a math and physics background. I am specifically interested in neural networks and their training. I know that this is a bit vague and, due perhaps in large part to my limited understanding of the field, I don't know what precisely I'm really looking for. There is a plethora of textbooks and online learning content available for those interested in machine learning, but seemingly few that discuss it at a depth I would find satisfactory. What textbooks/online courses/etc. might I use to move on from here?