So let me try and rephrase your question (please correct me if I'm wrong) -- you're asking why ZK is defined for only the YES instances and not for YES and NO instances both?
EDIT: There's a more intuitive answer (which I should have mentioned in the first place).
In a proof system, an honest verifier is trying to not be cheated i.e. soundness implies that that no prover can force the verifier to output $\mathtt{Accept}$ when the instance $x \in \mathtt{No}$ with high probability.Soundness protects the honest verifier.
When $x \in \mathtt{Yes}$, some prover that is not the honest prover could try to get the verifier to output $\mathtt{Reject}$, but we do not care about this (neither completeness or soundness handles this case). One may think of the verifier outputting $\mathtt{Accept}$ to be a very expensive operation, and outputting $\mathtt{Reject}$ to be a cheap operation.
Thus, outputting $\mathtt{Accept}$ when the answer should be $\mathtt{Reject}$ is very bad. We do not care about the other case.
Now given the above description of a proof system, we say the proof system is ZK, if in helping the verifier output the correct decision, the prover does not reveal anymore information other than the fact $x \in \mathtt{YES}$. So ZK protects the honest prover. Here the game is reversed. Now the verifier is trying to get the prover to reveal extra information.
If $x \in \mathtt{No}$, the honest prover does not need protection from ZK. It does not need to help honest verifier come up with the right answer at all. It could send nothing, or simply the statement that $x \in \mathtt{No}$. The cheating verifier has no incentive to think differently. So it does not make any sense to define ZK for the $\mathtt{No}$ instances.
Now you could say, what if we didn't make this restriction that outputting $\mathtt{Reject}$ was cheap.
Then one argument is that it is impossible to define ZK for both yes and no instances. The argument as to why follows from Chapter III of Salil Vadhan's thesis (https://people.seas.harvard.edu/~salil/research/phdthesis.pdf) pages 44-46. I'll summarise the idea below:
The definition of ZK roughly states that for every instance $x$ in the language, and for every verifier strategy $V^*$, there exists a PPT algorithm called the simulator which has blackbox access to $V^*$, that can produce a transcript called simulator output that is statistically indistinguishable from the transcript in the real protocol (the messages that $V^*$ receives during the protocol $\Pi(P, V^*)$ from an honest prover $P$ which we are claiming to be ZK)
The really short answer as to why we cannot define ZK for YES and NO instances is that it is impossible to define ZK to apply to both YES and NO instances.
For89 show that for statistical zero knowledge proofs, the simulators output distribution can be used to almost perfectly distinguish YES and NO instances.
In more detail, the act of simulation can be viewed as an interaction between a virtual verifier and a virtual prover (remember, the simulator does not actually talk to the prover, it pretends to be the prover (virtual prover), when interacting with $V^*$ which can be thought of as the virtual verifier).
If $x \in L$ (YES) instance, then both of the following must be true:
- The simulators messages would be accepted by the virtual verifier with high probability (otherwise the two transcripts would not be indistinguishable).
- The virtual verifier behaves like the real verifier (this is saying if the virtual prover interacted with the real verifier instead of the virtual verifier, nothing would really change).
Now if $x \notin L$ (NO), both of the above cannot hold. More simply, one if not not both of the above statements must be false.
Assume that this was not the case -- then the virtual prover can convince the real verifier with high probability (this is what both of the above statements being true mean).
Then the real prover could just use the virtual prover as the proving strategy and fool the real verifier (this would break soundness).