3

Currently reading Learning from Little: Comparison of Classifiers Given Little Training

In 3 Experiment Results, the following graph is shared:

The experiment is described as follow

We begin by examining an example set of results for the average TP10 performance over all benchmark tasks, where the training set has P=5 positives and N=200 negatives.We vary the number of features selected along the logarithmic x-axis.

I understand this as "we use a training set of 205 elements, 5 being positives and the 200 remaining being negatives". But looking at the results, Naive Bayes using Information Gain with a few features, and Multinomial Naive Bayes using Bi-normal separation with several hundred features both end up with 6.5 true positives in the top 10 (The TP10 metric is the number of true positives found in the 10 test cases that are predicted most strongly by the classifier to be positive).

I would have assumed that the models being closest to 5/10 would be the most accurate, but reading their results, it looks like the higher the better. Thus, it feels like I overlooked and misunderstood something. Can somebody enlighten me on this issue ?

Thank you

1 Answers1

1

Of course it is impossible to have a higher than $100\%$ true positive rate in your results.

The author does explain how he arrives at his results, but I must admit it is not at all obvious. In the subsection Performance Metrics of section 2 Experiment Protocol the author explains that the metrics are calculated as

The TP10 metric is the number of true positives found in the 10 test cases that are predicted most strongly by the classifier to be positive.

This mean that the TP10 is only calculated using the testing set. The author also explains in the subsection Experiment Procedure of section 2 Experiment Protocol states

The condition on the second loop ensures that there are 40 positives available for training plus at least 10 others in the test set (likewise, 200 negatives for training and at least 50 for testing)

So we see that there is actually a minimum of 10 testing instances with which the TP10 is computed.

JahKnows
  • 9,086
  • 31
  • 45