I'm not sure you even need to finetune here. I use LLMs a lot for information extraction of noisy data and find it works quite well for the most part. There are obviously times that it fails, but no amount of prompt engineering is going to get past that.
Depending what your data looks like, you could always instruct the LLM to extract information in a tabular way and have it output results in some json format. For example, you could have a prompt like this:
You are a an AI agent in the industrial maintenance field.
Your goal is to extract information from sentences related to: purchasing, <thing2>, <thing3>, etc.
The output should be in a structured json format as follows:
{
'purchase': true/false/unknown,
'<thing2>': ...,
...
}
Because of how LLMs are trained, they shouldn't have issues with this simple grammar swapping. And in this way you can construct a json format if you want that information for each data point.
Since you mentioned sentence transformers, you could also use a BERT style model to perform feature probing. This requires a bit more engineering, and you'd want to test for thresholds for comparison. One way to do this is to have your sentences: <sent1>, <sent2>, ... and the features you'd like to probe for <feature1>, <feature2>, .... Then, for each sentence, you compute the embedding of that sentence. This could be through the [CLS] token, a sentence transformer, averaging pooling, etc. This would get you the embeddings <emb1>, <emb2>, .... representing the embeddings of the sentence. Now you'd do something similar for the features to get <f1>, <f2>, ...
Let's say <feature1> is something simple like purchase. Just a single word to indicate you're testing whether or not that sentence indicates a purchase was made. You could compare how similar the embeddings <emb1>, <emb2>, ... are to that feature vector. If it's above some threshold, you can classify that sentence as positively having a purchase. If it's below some threshold, then now purchase was made.
One issue with this style of model is you may have a sentence that says "customer looked at items for purchase. no purchase made". If you're doing average pooling, it might see the multiple purchase features, and align too well to the purchase feature vector. So if you go this route you'd want to be careful with the type of model you're using for inference.