Train a model for unstructured data

Question

I am new to Machine Learning. I want to develop Curriculum Vitae recommender system. I want to determine how similar 2 CVs are, and given a random CV, it suggest which cluster of CVs it belongs to?

This is what I've already done, following a blog post:

I have a folder containing lot of CVs or resume text documents in plain text format (.txt).
I have done pre-processing on this data, like tokenization, stop words removal, stemming.
I extracted the Candidate's name, email-id, contact number, education and experience.

I am confused with how to train the data and how do I create a model for that. More specifically, I have the following questions:

Now how to create a model on text data?
Which algorithm I should apply on this data?

Please anyone answer. Your help will be appreciated.

Thanks.

score 0 · Accepted Answer · answered Feb 27 '19 at 12:37

I have worked on a similar project with JDs, we basically created a word2vec model for words in JD, the result were good as we had lots of JDs. Basically, what word2vec does is convert a word to vectorial representation which signifies context. You can check the documentation here: https://radimrehurek.com/gensim/

You may extract skills or other stuff from CVs, and do a semantic similarity based on w2v model. You may use a custom formulae for comparing similarity. Other things could be education, experience, similar projects etc

Train a model for unstructured data

1 Answers1