I have a dataset with 95% false and 5% true labels, some 200000 samples overall, I'm fitting a LightGBM model. I mainly need to focus on high precision and have low number of false positives, I don't care for accuracy much.
I have tried playing around with decision boundary after fitting and increasing weight for the positive class, this helps but I wonder if there is something else I could do.
Because the dataset is very unbalanced I think that the model is spending a lot of effort on the TN/FN boundary which I really don't care about. Also my intuition is that the standard cross-entropy loss is implicitly more focused on accuracy rather than precision.
I wonder if I could perhaps somehow pre-filter my dataset to maybe throw away 50% samples, but increase the initial T/F ratio. Or maybe this is what LightGBM already does and what I want is fundamentally impossible. Or perhaps there is an alternative to cross-entropy loss that I could use.