2

The problem: I've N classification models (independent), for each of these N models, I've different versions (eg: V0, V1, ..., Vfinal_production,Vexperimental). I'm looking for a way to store my datasets efficiently on the cloud (for redudency).

Note: We're not talking about BigData here.

Current Solution: Created a private GitHub repo. Made N directories and inside, pushed different dataset versions as different files.

Are there better solutions for this (because I feel VCS is an overkill for this problem)?

ngub05
  • 333
  • 1
  • 2
  • 8

1 Answers1

0

I used my existing directory structure with Git LFS and it works well.

Here are the advantages I got by using Git LFS other than the alternative solutions.

  • It saves space: Since Git LFS manages my dataset files (100MB to 1GB) with pointers and keeping only the minimum in the local memory.

  • It works seamlessly with a git enabled repository: I was planning to use some storage solution (like Amazon S3) specific for my datasets. But I would have to make extra efforts to manage multiple version and keep my dataset files in sync.

ngub05
  • 333
  • 1
  • 2
  • 8