Skip to content

SLURM training resync src/ after each restart

Since $SLURM_JOBID is used to uniquely identify runs, but this ID changes between restarts, the slurm_train.sh scripts resyncs the src/ folder each time, which can lead to unexpected crashes.

A solution is to use the starting date as a UID which makes the training src/ independent of the source src/ during the whole duration of the training.