Comprehensive documentation for the Galenos medical AI assistant
Galenos showcases fine-tuning Microsoft's Phi-3-mini-4k-instruct model on medical dialogue data using QLoRA (Quantized Low-Rank Adaptation). This project presents the complete pipeline for creating a medical AI assistant that provides structured, safety-focused responses to health-related queries.
| Model | microsoft/Phi-3-mini-4k-instruct |
| Parameters | 3.8 Billion |
| Context Length | 4096 tokens |
| Architecture | Transformer (Decoder-only) |
| Method | QLoRA (4-bit + LoRA) |
| Quantization | 4-bit NF4 with double quantization |
| LoRA Rank (r) | 16 |
| LoRA Alpha | 32 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
| Trainable Parameters | ~16M (<1% of base model) |
| Learning Rate | 2e-4 |
| Batch Size | 2 (per device) |
| Gradient Accumulation | 4 steps |
| Optimizer | paged_adamw_32bit |
| Precision | bfloat16 / fp16 |
| Max Sequence Length | 1024 tokens |
This model should NEVER be used for actual medical diagnosis, treatment decisions, emergency situations, or replacing professional healthcare providers.