I am working on a regression task to estimate glucose concentration from image data. The images are of reagent test strips, where a chemical reagent reacts with a blood sample and changes colour (ideally brown, but contaminated by red due to blood spillover). I have about only $108$ (possibly a bit more in the future) images of the strips with their corresponding glucose value of the patient; I am extracting the ROI (where the reaction takes place) from these strips, where I extract statistical features from different colour spaces (RGB, HSV, LAB, LUV, YCbCr).
Each sample image results in $32$ tabular features such as $r_{mean}, g_{std}, value_{mean}, saturation_{std}, u_{mean}, cr_{std}$, etc. These are fed into a hybrid deep learning model that takes both the raw image and these extracted features.
These images were taken in same lighting and background conditions.
However, I’ve encountered a few issues:
Only around $14$ of the features show even moderate correlation $(\ge 0.3)$ with the glucose value.
After removing highly collinear features $(\rho > 0.9)$, only $2$ features remain.
The ROI is largely circular and color-based, so spatial features (from CNN) may not be capturing much.
Despite using RGB and other color spaces, red-channel dominance due to blood contamination biases the input ($r_{mean}$ is the greatest among all other features).
My Questions:
- Should I keep moderately correlated or repeating features in this context, especially since I'm using a CNN + tabular hybrid model?
- Is it better to aggressively drop highly collinear features $(> 0.9)$, or allow redundancy in this case?
- Would PCA or another dimensionality reduction method be preferable to simple correlation-based filtering?
- Should I rely more on specific colour spaces (e.g., LAB or HSV) due to the nature of colour change from reagent?
- Should I drop the idea of a hybrid model and use a only the tabular features(in this case what features should I use and would be helpful) ?
- How can I reduce the error due to blood contamination the most ?
- Should I use some pretrained CNN model like MobileV2Net or something similar ?
Any advice on selecting the right features, changing the model, using something else entirely or improving interpret-ability/predictive power in such a constrained setup would be greatly appreciated.
Edit: Here is a sample picture

So the blood from a person would be applied in the top semi-circle portion of the strip from where due to capillary action only the plasma part of the blood (containing the glucose molecules) would come to circle in the middle where the reagent has been applied. Only the glucose molecule in the plasma would react with this reagent causing a brownish colour to appear (intensity of which corresponds to the glucose level of blood). Now ideally only the plasma of the blood should fall on the circular region but that's not the case since some amount of blood comes into the reaction area.
The ROI is the reaction area (circle) in the centre.
I am extracting only the image in the centre (excluding the blood contamination as much as possible) and computing $32$ features from it (present in the correlation image I attached).
I tried taking all the $32$ features and here is the correlation between them.
