2

I have a pdf file (admission application). I want to read/search the pdf and extract terms with similar meaning and then convert this data into a DataFrame to save as a xlsm file. HELP!

Keetj
  • 21
  • 1
  • 2

1 Answers1

4

in my opinion, you have 4 possibilities:

  • You may treat the pdf directly using tabula

  • You may convert the pdf to text using pdftotext, then parse text with python

  • You may use an external tool, to convert your pdf file to excel or CSV, then use required python module to open the excel/CSV file.

  • You may also convert pdf to an image file, then use any recent OCR software (which reconstruct table automatically from the picture) to get data

This answer comes from:

Your question is near similar to:

Regards

Carlos Mougan
  • 6,430
  • 2
  • 20
  • 51