The two experiments are the same configuration, except for the max-duration. The md=1000 experiment has better pre-training performance. Both experiments uses fp16.