Is the FT script correct? (original) (raw)

Hi,

I ran the fine-tune script on Mistral base model but found rather poor results on ARC Challenge (<50% with retrieval). Any ideas why? I will repeat with Mistral Instruct to see if it makes a beneficial difference, but I am not optimistic as I have seen similar poor results when fine-tuning this model with the self-rag dataset and script.

MODEL_SIZE=7B NUM_GPUS=8 BATCH_SIZE_PER_GPU=1 TOTAL_BATCH_SIZE=128 GRADIENT_ACC_STEPS=$(($TOTAL_BATCH_SIZE/$NUM_GPUS/$BATCH_SIZE_PER_GPU)) echo "Training llama model MODELSIZEusing{MODEL_SIZE} using MODELSIZEusingNUM_GPUS GPUs, BATCHSIZEPERGPUbatchsizeperGPU,BATCH_SIZE_PER_GPU batch size per GPU, BATCHSIZEPERGPUbatchsizeperGPU,GRADIENT_ACC_STEPS gradient accumulation steps"

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch
--mixed_precision bf16
--num_machines 1
--num_processes $NUM_GPUS
--use_deepspeed
--deepspeed_config_file stage3_no_offloading_accelerate.conf
finetune.py
--model_name_or_path mistralai/Mistral-7B-v0.1
--use_flash_attn
--tokenizer_name mistralai/Mistral-7B-v0.1
--use_slow_tokenizer
--train_file full_output_1005.jsonl
--max_seq_length 2048
--preprocessing_num_workers 16
--per_device_train_batch_size $BATCH_SIZE_PER_GPU
--gradient_accumulation_steps $GRADIENT_ACC_STEPS
--learning_rate 2e-5
--lr_scheduler_type linear
--warmup_ratio 0.03
--weight_decay 0.
--num_train_epochs 5
--output_dir output/mistral_root_${MODEL_SIZE}/
--with_tracking
--report_to tensorboard
--logging_steps 1
--use_special_tokens

EDIT: I had a chance to look into this today, I am fairly confident the issue is that this script will NOT work for a model that has not had the tokenizer independently prepared. Will confirm and close the issue - it might be nice to add some information on how to independently replicate the result fine-tuning from scratch.