It is related to this question:Deriving cost function using MLE :Why use log function? I wish I can just comment and get answer from @user76170, but as I am new to here I am not allowed to comment. Hence, here comes the question:
I don't understand why @user76170 takes the likelihood as $p(x_i\mid y_i)$. It should be $p(y_i\mid x_i;\theta)$ in my opinion.
And I don't know why in Andre Ng's class the cost function is a average cost (we have $m$ observations, so we divide it by $m$), like this: $$ −\frac{1}{m} \left(\sum_{i=1}^m y_i\log(h_\theta(x_i))+(1−y_i) \log(1−h_\theta(x_i))\right) $$ what if we don't take average?