I know that TFIDF is an NLP method for feature extraction.
and I know that there are libraries that calculate TFIDF directly from the text.
This is not what I want though
In my case, my text dataset has been converted into Bag of words
The original dataset that I "DO NOT" have access to, looks like this
RepID RepText
------------------
1 Doctor sys patient has diabetes and needs rest for ...
2 Patients history: broken arm, and ...
3 A dose of Metformin 2 times a day ...
4 Xray needed for the chest...
5 Covid-19 expectation and patient should have a rest ...
But my dataset looks like this
RepID Word BOW
-------------------------
1 Doctor 3
1 diabetes 4
1 patient 1
. . .
. . .
2 patient 2
2 arm 7
. . .
. . .
5684 cough 9
5684 Xray 3
5684 Covid 5
. . .
. . .
What I want is to find TFIDF for each word in my dataset.
I was thinking of converting my dataset into a unstructured format
so it would look like this
RepID RepText
------------------
1 Doctor Doctor Doctor diabetes diabetes diabetes diabetes ...
2 Patients patients arm arm arm arm arm arm arm ...
.
.
5684 cough cough cough cough cough cough cough cough cough Xray Xray
so each word repeated the same number of BOW
but I do not think this is the best way to do as I convert a structured dataset into an unstructured one..
How to find the TFIDF from the structured dataset? is there a library or algorithm for that?
Note :
Dataset stored in MS SQL Server, and I am using Python code.