Training#

For learning, use the yaml configuration file under the projects directory. For example, the contents of projects/opt/exp001.yml has the following contents:

training_config:
per_device_train_batch_size: 2
gradient_accumulation_steps: 4
num_train_epochs: 1
dataloader_num_workers: 16
fp16: true
optim: "adamw_torch"
learning_rate: 5.0e-5
logging_steps: 100
evaluation_strategy: "steps"
save_strategy: "steps"
eval_steps: 4000
save_steps: 4000
save_total_limit: 1
deepspeed: ./configs/deepspeed/ds_config_zero1.json
output_dir: ./output/
report_to: "wandb"

model_config:
fp16: true
pretrained_path: # None or path to model weight
model_type: git_llm
language_model_name: facebook/opt-350m
vision_model_name: openai/clip-vit-base-patch16
num_image_with_embedding: 1 # if 1, no img_temporal_embedding
max_length: 512
keys_to_finetune:
   - visual_projection
   - num_image_with_embedding
keys_to_freeze: []

use_lora: true
lora:
   r: 8
   lora_alpha: 32
   target_modules:
      - q_proj
      - k_proj
      - v_proj
   lora_dropout: 0.01
   bias: none
   task_type: CAUSAL_LM

dataset_config_path:
- ./configs/datasets/m3it.yaml

training_config sets the training configuration, model_config sets the model configuration, and dataset_config_path sets the dataset configuration. The following LLM modules are currently supported for model_type. We plan to add more supported modules in the future.

To start learning, execute the following command.

./scripts/run.sh

GPU is required for learning; we have tested on Ubuntu 20.04, CUDA 11.7.