2

There are 30k+ history injury subjective and objective note records in the dataset, each one with correct corresponding treatments data, the simplified history dataset structure is like:

injury subjective note | GP's objective note | treatments note

I cut my finger, ... | The wound is bleeding, ... | Clean the wound, treat with antibiotic ointment, ...

Fell from cycling, forearm hurts, ... | Bone fracture, ... | Bone X-ray, splint, braces, casts, slings, ...

Now how to using ML/AI to suggest (output) treatments based on new (input) subjective and objective note? Would like to know:

  1. What algorithms could be best fit?
  2. What tools/platforms are good candidates to choose from?
  3. Any resource links/pages with similar project/purpose that could learn from?
J.W
  • 121
  • 2

1 Answers1

1

There is a lot of research being done around exploiting biomedical text data in general and clinical notes in particular. I'm not up to date with the whole domain (it's a big one) but let me sketch a few possible directions.

  • The standard text classification approach: consider every possible treatment as a class, the goal being to predict for each instance of subjective+objective notes the correct class(es). Note that the design can either be standard multiclass classification or multi-label classification. All the regular text classification techniques can be used, from traditional methods like decision trees to using DL with word/text embeddings.
  • The sequence to sequence approach: the principle here is for the model to represent how an input sequence (here subjective+objective notes) is transformed into an output sequence (treatment). The standard example for this kind of task is machine translation, but this design is used in many other problems.
  • The semantic representation with a third-party ontology. This approach relies on existing biomedical resources, especially normalized vocabularies like Mesh and/or UMLS. One would use an automatic annotation tool (e.g. cTakes) to extract the medical terms from the subjective+objective notes, and the treatments could also be encoded with the controlled vocabulary. This can greatly facilitate the job of the model by providing it with normalized features and classes. Once encoded, the data can be used in a classification setting similar to the first approach.

About software/data resources: there are a lot, but as far as I know nothing which does exactly what you need out of the box. SciSpacy is a Python module for processing biomedical data. There are datasets available (e.g. shared tasks data), and various research prototypes and resources. See also this answer for pointers to various biomedical resources.

Erwan
  • 26,519
  • 3
  • 16
  • 39