-2

I am using Word2Vec for text vectorization. It is doing a good job but some cases it is failing. For example "turn the computer off and on" and the sentence "restart the computer" does not have a very good similarity score, even though they mean the same thing. Doc2Vec is not doing a good job as my inputs are usually a couple of sentences and not a document.

Can anyone please suggest an approach which would give a good similarity score between "turn on and off" and "restart" and also other combinations like that?

Shamy
  • 207
  • 2
  • 3

2 Answers2

0

If you are training your word2vec by yourself than you should increase your training dataset. You can easily get the Wikipedia database. If you are using a pretrained model, you can always fine tune it with additional data.

HatemB
  • 326
  • 2
  • 7
0

One approach you could take is to build sentence vectors using vectors generated for Words.

This post covers the different techniques you could use to achieve it.

Ethan
  • 1,657
  • 9
  • 25
  • 39
Nischal Hp
  • 795
  • 3
  • 10