3

I am reading Bishop's Mixture Density Network paper at: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/bishop-ncrg-94-004.pdf

This is a good paper, but I am still confused about a few small details. I am wondering if anyone could give me some help:

  1. Basically the mixing coefficient alpha_i can be computed through a softmax function in eq (25) below. However, in eq (25), what's the upper alpha for each $(z_i)^{alpha}$? Is it a free parameter to be fitted?

  2. Similarly, in eq (26), what's the upper sigma in $(z_i)^{sigma}$? Is it a free parameter to be fitted as well? Thanks!

enter image description here

enter image description here

Green Falcon
  • 14,308
  • 10
  • 59
  • 98
Edamame
  • 2,785
  • 5
  • 25
  • 34

1 Answers1

2

Actually, the upper alpha and the upper sigma are not free parameters to be set, they are just used to represent the output activations corresponding to the mixture coefficients and the variances, respectively. They are used to distinguish derivatives with respect to the alpha and sigma. I say it from page 275 of the book “pattern recognition and machine learning” by Christopher Bishop”:

page 275 of the book “pattern recognition and machine learning” by Christopher Bishop”

Ryan Ghorbandoost
  • 1,266
  • 2
  • 8
  • 17