Database Collections¶
Collection: Problem¶
A problem
contains the following elements:
uuid
: a unique identifier of the problem. db.UUIDField().workflow
: UUID of the associatedproblem workflow
stored on storage. db.UUIDField().Aproblem workflow
mainly defines what are the data targets and the performance metric used to evaluate machine learning models. An example of aproblem workflow
is given for sleep stages classification here.timestamp_upload
: timestamp of the problem creation. db.DateTimeField().test_dataset
: list of UUIDs of test data, which are not accessible, except byCompute
to compute performances of submitted algorithms. db.ListField(db.UUIDField()).size_train_dataset
: size of mini-batch for each training task. db.IntegerField().
Collection: Learnuplet¶
A learnuplet
defines a learning task. It is constructed by the Orchestrator
in two cases:
- when new data is uploaded
- when a new algorithm is uploaded
It is then used by
Compute
to do the training.
A learnuplet is made of the following elements:
uuid
: a unique identifier of the task. db.UUIDField().problem
: the UUID of the problem associated to the learning task. db.UUIDField().workflow
: the UUID of the problem workflow associated to the learning task. db.UUIDField().train_data
: list of train data UUIDs, on which the learning will be done. db.ListField(db.UUIDField()).test_data
: list of test data UUIDs, on which the performance of the algorithm is computed. db.ListField(db.UUIDField()).algo
: UUID of submitted algorithm. db.UUIDField().model_start
: UUID of model to be trained. Ifrank=0
, this UUID is the same asalgo
. db.UUIDField().model_end
: UUID of the model obtained after training ofmodel_start
. db.UUIDField().rank
: rank of the task, which defines the order in which learnuplets must be trained. For more details, see in Details on the construction of a learnuplet at algorithm upload and in Details on the construction of a learnuplet at data upload.worker
: UUID of worker which is in charge of the training task defined by this learnuplet. db.UUIDField().status
: status of the training task. It can bewaiting
if we are waiting for a model training with a lower rank,todo
if the traiing job can start,pending
if a worker is currently consuming the task, ordone
if training has been done successfully, orfailed
is trainig has been unsuccesfully done. db.StringField(max_length=8).perf
: performance on test data. db.FloadField().test_perf
: dictionary of performances on test data: each element is the performance on one test data file (the keys being the corresponding data uuids). db.ListField(db.FloatField()).train_perf
: dictionary of performances on train data: each element is the performance on one train data file (the keys being the corresponding data uuids). db.ListField(db.FloatField()).training_creation
: timestamp of the learnuplet creation. db.DateTimeField().training_done
: timestamp of feeback from compute (when updatingstatus
todone
orfailed
). db.DateTimeField().
Details on the construction of a learnuplet at algorithm upload¶
When uploading a new algorithm, its training is specified in learnuplets
by the Orchestrator
.
For now, they are constructed following these steps:
- selection of associated
active data
: for now all data corresponding to the same problem with targets.This might change later to lower computational costs. - for each mini-batch containing
size_train_dataset
(parameter fixed for theproblem
), creation of a learnuplet.Each learnuplet contains the UUID of the model from which to start the training inmodel_start
and UUID where to save the model after training inmodel_end
.The first learnuplet hasrank=0
,status=todo
and a specifiedmodel_start
, and other have incremental values ofrank
,status=todo
and nothing inmodel_start
(filled later). +Model from which to start the learning is not defined for learnuplets withrank=i
at learnuplet creation, but whenperformance
oflearnuplet
withrank=i-1
is registered on theOrchestrator
. At this moment, theOrchestrator
looks for themodel_end
of thelearnuplet
with the best performance to choose it as themodel_start
for learnuplet ofrank=i
.
Details on the construction of a learnuplet at data upload¶
When uploading new data, relevant models are updated.
For now, the construction of corresponding learnuplets
is made as follows:
- selection of relevant models called
active models
: for now all models corresponding to the same problem.This might change later to lower computational costs. - for each algorithm:
- 2.1 find the model which has the best performance (which is not necessarily the one with the highest rank).
- 2.2 for each mini-batch containing
size_train_dataset
(parameter fixed for theproblem
), creation of a learnuplet starting from the model found in 2.1.
Collection: Algo¶
An algo
represents a untrained machine learning model for a given problem
submitted via Analytics
, stored in Storage
, and registered in the Orchestrator
database.
An algo
has the following fields:
uuid
: a unique identifier of the algo. db.UUIDField().problem
: UUID of the associated problem. db.UUIDField().name
: name of the algo. db.StringField().timestamp_upload
: timestamp of registration onOrchestrator
. db.DateTimeField().
For details about how to register an algo
, see the endpoints documentation.
Note: For now, there is no field to indicate who submitted the algo, since it is out of scope for phase 1.1.
For phase 1.2, a Poster
collection might be introduced (with an uuid
and a token
fields), and its uuid
might be added to the algo
table.
Collection: Data¶
A data
is submitted via the Viewer
, stored in Storage
, and registered in the Orchestrator
database. It has the following fields:
uuid
: a unique identifier of the data. db.UUIDField().problems
: list of UUIDs of associated problems (a data can be associated with several problems). db.ListField(db.UUIDField()).timestamp_upload
: timestamp of registration onOrchestrator
. db.DateTimeField().
Note: For now, there is no field to indicate who submitted the algo, since it is out of scope for phase 1.1.
For phase 1.2, a Poster
collection might be introduced (with an uuid
and a token
fields), and its uuid
might be added to the data
table.
For details about how to register a data
, see the endpoints documentation.
Collection: Preduplet¶
A preduplet
is created in the Orchestrator
when a prediction is requested. It has the following fields:
uuid
: . db.UUIDField()problem
: UUID of the associated problem db.UUIDField(max_length=50).workflow
: UUID on Storage of the workflow associated with the problem db.UUIDField(max_length=50).data
: UUID on Storage of the data from which to compute the prediction db.ListField(db.UUIDField()).prediction_storage_uuid
: UUID of the associated prediction file on Storage db.ListField(db.UUIDField()).model
: UUID on Storage of the model used to compute the prediction db.UUIDField().worker
: UUID of the worker on which computation are made db.UUIDField().status
: db.StringField(max_length=8).timestamp_request
: db.DateTimeField().timestamp_done
: db.DateTimeField().
For details about how to request a prediction, see the endpoints documentation.