1

so we do numerous types of ML experiments using a number of frameworks.

We've ended up writing an awful lot of boilerplate :

  1. read some configuration as to the experiment's parameters, variables, mode and so on
  2. create all the necessary output directories that we'll need for all the different combinations of epoch, parameter, variable etc.

And then we traverse those output directories again to collate and summarise the results

1. and 2. are fragile (i.e. they break when new requirements come in e.g. a new "mode" for the experiment, that alters the directory layout), tedious to write and maintain and, well, boilerplate.

Questions:

  • is there some pypi or conda package that can do some of this dredgery for us?
  • is there a neat design pattern or idiom someone could suggest? E.g. each class that writes to disk knows where it should write to and just creates the directory as needed (like lazy evaluation or something).
  • does anyone instead use an object store e.g. MongoDB instead of saving everything to a multitude of directories? How has that been? I can see that having potential.

Many thanks.

sming
  • 111
  • 3

0 Answers0