0

I'm working on a project (Python) to enforce the company naming convention of products on product lists provided by clients/suppliers. I'm having a list of company names (Standardised names) and those of external. I'm considering typos too - generating this list using GPT.

Here's are the models I'm considering:
Sequence-to-Sequence (Seq2Seq): LTSM over RNN
Transformer-Based Models: BERT on custom data

Additionally, I'm looking up fuzzy string matching.

Could anyone recommend other approaches, or if I'm missing something? Greatly appreciated :)

Edit:
This a sample of the dataset I'm dealing with. This is the correct names. I'm creating my own dataset with around 20 incorrect names alongside the correct ones. There are about 20k+ unique names.

Dataset

0 Answers0