Fine-Tuning Methodology

A step-by-step guide to recreating Galenos using Phi-3-mini, QLoRA, and TRL.

Base Stack Selection

Choose the foundation for your fine-tuning pipeline:

•Base Model: microsoft/Phi-3-mini-4k-instruct
•Training Library: TRL SFTTrainer
•Quantization: bitsandbytes 4-bit (NF4)
•PEFT: LoRA/QLoRA adapter training

Dataset Preparation

Create a JSONL dataset with chat-formatted medical dialogues:

{
  "messages": [
    {
      "role": "system",
      "content": "You are Galenos, a careful medical assistant..."
    },
    {
      "role": "user",
      "content": "Patient has fever and cough for 3 days..."
    },
    {
      "role": "assistant",
      "content": "**Summary:** ... **Possible Causes:** ..."
    }
  ]
}

⚠️ Avoid prompt leakage: Keep system prompts consistent and don't include hidden labels.

Tokenization & Chat Template

Use the model's native chat template to avoid format mismatches:

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.apply_chat_template(messages, tokenize=False)

4-bit Model Loading (QLoRA)

Load the model in 4-bit quantization to reduce memory usage:

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto"
)

LoRA Configuration

Configure low-rank adapters for efficient fine-tuning:

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_dropout=0.05,
    task_type="CAUSAL_LM"
)

Training with SFTTrainer

Use TRL's SFTTrainer with stability-focused hyperparameters:

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=config,
    max_seq_length=1024,
    args=TrainingArguments(
        learning_rate=2e-4,
        gradient_accumulation_steps=4,
        max_grad_norm=1.0
    )
)

trainer.train()

Evaluation

Test format adherence, safety, and quality on a curated eval set.

Save Adapters

Export LoRA adapters or merge with base model for deployment.

Deploy

Create a Gradio interface or API endpoint for inference.

Ready to Try It?

Experience the fine-tuned model in action.

Launch Demo →