Model Card

Comprehensive documentation for the Galenos medical AI assistant

3.8B

Parameters

4-bit

Quantization

16M

LoRA Params

Context Length

Model Description

Galenos showcases fine-tuning Microsoft's Phi-3-mini-4k-instruct model on medical dialogue data using QLoRA (Quantized Low-Rank Adaptation). This project presents the complete pipeline for creating a medical AI assistant that provides structured, safety-focused responses to health-related queries.

Technical Specifications

Base Model

Model	microsoft/Phi-3-mini-4k-instruct
Parameters	3.8 Billion
Context Length	4096 tokens
Architecture	Transformer (Decoder-only)

Fine-Tuning Configuration

Method	QLoRA (4-bit + LoRA)
Quantization	4-bit NF4 with double quantization
LoRA Rank (r)	16
LoRA Alpha	32
Target Modules	q_proj, k_proj, v_proj, o_proj
Trainable Parameters	~16M (<1% of base model)

Training Hyperparameters

Learning Rate	2e-4
Batch Size	2 (per device)
Gradient Accumulation	4 steps
Optimizer	paged_adamw_32bit
Precision	bfloat16 / fp16
Max Sequence Length	1024 tokens

Limitations & Safety

⚠️ Critical Limitations

•Not a Medical Professional: AI-generated responses may contain errors
•No Real-Time Knowledge: Cannot access current medical research
•Potential Hallucination: May generate plausible but incorrect information
•Limited Training Data: Based on synthetic medical dialogues

Out-of-Scope Use

This model should NEVER be used for actual medical diagnosis, treatment decisions, emergency situations, or replacing professional healthcare providers.

Resources & Downloads

📓

Training Notebook

Complete Colab-ready notebook

📊

Sample Dataset

Medical dialogue examples

📄

Full Model Card

Complete documentation (MD)

📚

Methodology Guide

Step-by-step tutorial

Acknowledgments

• Base Model: Microsoft Phi-3 team
• Libraries: Hugging Face (Transformers, PEFT, TRL), bitsandbytes
• Inference: Groq for fast API access