Model Card

Comprehensive documentation for the Galenos medical AI assistant

3.8B
Parameters
4-bit
Quantization
16M
LoRA Params
4K
Context Length

Model Description

Galenos showcases fine-tuning Microsoft's Phi-3-mini-4k-instruct model on medical dialogue data using QLoRA (Quantized Low-Rank Adaptation). This project presents the complete pipeline for creating a medical AI assistant that provides structured, safety-focused responses to health-related queries.

Technical Specifications

Base Model

Modelmicrosoft/Phi-3-mini-4k-instruct
Parameters3.8 Billion
Context Length4096 tokens
ArchitectureTransformer (Decoder-only)

Fine-Tuning Configuration

MethodQLoRA (4-bit + LoRA)
Quantization4-bit NF4 with double quantization
LoRA Rank (r)16
LoRA Alpha32
Target Modulesq_proj, k_proj, v_proj, o_proj
Trainable Parameters~16M (<1% of base model)

Training Hyperparameters

Learning Rate2e-4
Batch Size2 (per device)
Gradient Accumulation4 steps
Optimizerpaged_adamw_32bit
Precisionbfloat16 / fp16
Max Sequence Length1024 tokens

Limitations & Safety

⚠️ Critical Limitations

  • Not a Medical Professional: AI-generated responses may contain errors
  • No Real-Time Knowledge: Cannot access current medical research
  • Potential Hallucination: May generate plausible but incorrect information
  • Limited Training Data: Based on synthetic medical dialogues

Out-of-Scope Use

This model should NEVER be used for actual medical diagnosis, treatment decisions, emergency situations, or replacing professional healthcare providers.

Acknowledgments

  • Base Model: Microsoft Phi-3 team
  • Libraries: Hugging Face (Transformers, PEFT, TRL), bitsandbytes
  • Inference: Groq for fast API access