Robustness is sort of a subjective matter. In a nutshell, if you produce an estimate with a robust estimator, and then you add a very extreme data point and re-estimate, you shouldn't produce an estimate that is too different from your first estimate.
What does "extreme" mean? What does "too different" mean? This is precisely where the ambiguity comes in.
One popular way to measure extremity in a regression, for example, is through calculating the influence of each point (look this up). Another way is by checking the jacknife residauls or the cook's distances of the points. All of these methods were developed heuristically. In most cases, statisticians use them all simultaneously to make statements about the robustness of the estimators.
Once you determine how you want to define extremity, judging if the estimator produces a result that is "too different" boils down to a hypothesis test. Some of these are classical parametric tests (i.e. if you know the distribution of the index you used to measure extremity) and others are empirical (like a parametric bootstrap).
In general, bias does not play a role in determining how an estimator will react to extreme data, but consistency might (think about this for the case where you have large N).
For the MLE, what mathematical tools did you use to produce the estimate? Are the estimates sensitive? These are the questions you should be asking.