1

I am new to Machine Learning. I want to develop Curriculum Vitae recommender system. I want to determine how similar 2 CVs are, and given a random CV, it suggest which cluster of CVs it belongs to?

This is what I've already done, following a blog post:

  1. I have a folder containing lot of CVs or resume text documents in plain text format (.txt).

  2. I have done pre-processing on this data, like tokenization, stop words removal, stemming.

  3. I extracted the Candidate's name, email-id, contact number, education and experience.

I am confused with how to train the data and how do I create a model for that. More specifically, I have the following questions:

  1. Now how to create a model on text data?

  2. Which algorithm I should apply on this data?

Please anyone answer. Your help will be appreciated.

Thanks.

mapto
  • 744
  • 5
  • 16
Heena
  • 15
  • 4

1 Answers1

0

I have worked on a similar project with JDs, we basically created a word2vec model for words in JD, the result were good as we had lots of JDs. Basically, what word2vec does is convert a word to vectorial representation which signifies context. You can check the documentation here: https://radimrehurek.com/gensim/

You may extract skills or other stuff from CVs, and do a semantic similarity based on w2v model. You may use a custom formulae for comparing similarity. Other things could be education, experience, similar projects etc

Itachi
  • 251
  • 2
  • 8