I will soon deal with multiple projects in python. Some of them have to run regularly (many times a day), they can take some time (many days) and they use/produce some data coming from and going to other servers.
I would like to know what tools you use to handle this kind of processes, knowing that they run on python. These tools could focus on:
- workflow management (what task should be executed when ?)
- data synchronization (database updates/synchronization)
- resources monitoring (space on disk, RAM and CPU ?)
- versioning (when these input/output arrived, should I treat them, or is there a new version ?)
- visualization/monitoring (easy estimate the state of the processes, be alerted if an error occurred)
As an example, luigi tackle some of these problems.
Share links to articles or to tutorials if you have some in mind. I think this kind of challenges are faced by many people starting machine learning projects for production purposes.