How to Fine-Tune Gemma 4 on Emotions: A Complete Step-by-Step Guide
By Braincuber Team
Published on April 20, 2026
Google has just introduced Gemma 4, describing it as its most intelligent open model family so far, built for strong reasoning and agentic workflows. Gemma models are designed to be flexible across environments, with official support and tooling for local development, cloud deployment, and model customization, which makes them a strong choice for fine-tuning projects.
What You'll Learn:
- Setting up a Runpod environment with GPU access
- Loading and preparing the emotion classification dataset
- Formatting data for Gemma 4 supervised fine-tuning
- Loading Gemma 4 with 4-bit quantization
- Evaluating the base model before training
- Fine-tuning with LoRA adapters
- Evaluating and comparing results
1. Setting Up the Environment
Start by launching a new Runpod instance, and make sure your account has at least $5 in credit before you begin. For this tutorial, choose a 3090 GPU pod and select the latest PyTorch template.
Before deploying, open the template settings and make a few updates. Increase both the container disk and volume disk to 40 GB so you have enough space for the model, dataset, cached files, and training checkpoints.
You should also add your Hugging Face token as an environment variable. You can generate this token from Settings > Access Tokens in your Hugging Face account.
%%capture
!pip install -U transformers accelerate datasets trl peft bitsandbytes scikit-learn huggingface_hub
These packages will cover the full workflow, including loading the dataset, preparing the model, fine-tuning, and evaluation.
import os
from huggingface_hub import login
hf_token = os.environ.get("HF_TOKEN")
if not hf_token:
raise ValueError("Set HF_TOKEN in the RunPod environment before running this notebook.")
login(token=hf_token)
print("Logged in to Hugging Face.")
2. Load and Prepare the Emotion Dataset
Now that the environment is ready, the next step is to load the emotion dataset from Hugging Face and prepare smaller splits for training and evaluation.
For this tutorial, we are not using the full dataset. Instead, we create limited train, validation, and test splits so the fine-tuning process stays faster and easier to run on a single GPU.
from datasets import load_dataset, DatasetDict
TRAIN_LIMIT = 4000
VALIDATION_LIMIT = 400
TEST_LIMIT = 400
EVAL_LIMIT = 400
raw_dataset = load_dataset("dair-ai/emotion")
def maybe_limit(split, limit):
split = split.shuffle(seed=42)
if limit is None:
return split
return split.select(range(min(limit, len(split))))
dataset = DatasetDict({
"train": maybe_limit(raw_dataset["train"], TRAIN_LIMIT),
"validation": maybe_limit(raw_dataset["validation"], VALIDATION_LIMIT),
"test": maybe_limit(raw_dataset["test"], TEST_LIMIT),
})
dataset
The final dataset contains 4,000 training examples, 400 validation examples, and 400 test examples. Next, we look at the label names stored in the dataset.
label_names = dataset["train"].features["label"].names
label_names
This shows that the task has six emotion categories: sadness, joy, love, anger, fear, and surprise.
3. Formatting Data for Gemma 4 Fine-Tuning
Before we can fine-tune the model, we need to convert the dataset into the format Gemma 4 will use during training.
Instead of passing only raw text and labels, we structure each example as a short chat interaction with a system message, a user message, and the expected assistant response.
SYSTEM_PROMPT = """You are an emotion classification assistant.
Read the user's text and answer with exactly one label.
Only choose from: sadness, joy, love, anger, fear, surprise.
Return only the label and nothing else."""
The system prompt tells the model exactly what task it should perform. In this case, we want the model to act as an emotion classification assistant and return only one of the six allowed labels.
def to_prompt_completion(example):
text = example["text"]
label = label_names[example["label"]]
return {
"prompt": [
{
"role": "system",
"content": SYSTEM_PROMPT,
},
{
"role": "user",
"content": f"Classify the emotion of this text:\n\n{text}",
},
],
"completion": [
{
"role": "assistant",
"content": label,
}
],
}
sft_dataset = dataset.map(to_prompt_completion, remove_columns=dataset["train"].column_names)
4. Load Gemma 4 E4B-it With 4-Bit Quantization
Now we can load Gemma 4 E4B-it and prepare it for fine-tuning. Since this is a relatively large model, we load it with 4-bit quantization to reduce memory usage and make it easier to run on a 3090 GPU.
We also use bfloat16 as the compute type, which helps keep the setup efficient.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
MODEL_ID = "google/gemma-4-E4B-it"
MODEL_DTYPE = torch.bfloat16
USE_4BIT = True
if torch.cuda.is_available():
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
processor = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
if processor.pad_token is None:
processor.pad_token = processor.eos_token
bnb_config = None
model_kwargs = {
"device_map": "auto",
}
if USE_4BIT:
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=MODEL_DTYPE,
)
model_kwargs["quantization_config"] = bnb_config
else:
model_kwargs["torch_dtype"] = MODEL_DTYPE
base_model = AutoModelForCausalLM.from_pretrained(MODEL_ID, **model_kwargs)
base_model.config.use_cache = False
base_model.config.pad_token_id = processor.pad_token_id
base_model.config.bos_token_id = processor.bos_token_id
base_model.config.eos_token_id = processor.eos_token_id
base_model.generation_config.pad_token_id = processor.pad_token_id
base_model.generation_config.bos_token_id = processor.bos_token_id
base_model.generation_config.eos_token_id = processor.eos_token_id
print(f"Base model loaded with 4-bit={USE_4BIT} and dtype={MODEL_DTYPE}.")
5. Evaluate the Base Model
Before fine-tuning, it is useful to evaluate the base model first so we have a clear baseline to compare against later.
In this section, we define a few helper functions that generate predictions, extract valid emotion labels, and run evaluation on the test split.
Important
The baseline results show that the untuned model already performs reasonably well, but there is still room for improvement. The accuracy is around 58.25%, and the macro F1 score is around 0.42.
6. Fine-Tune Gemma 4 With LoRA
Now that we have the baseline results, we can fine-tune Gemma 4 using LoRA.
LoRA is a parameter-efficient fine-tuning method, which means we do not update the full model. Instead, we attach a small number of trainable adapter weights on top of the base model. This makes training much lighter and more practical on a single GPU.
Parameter Efficient
LoRA only trains a small number of adapter weights instead of the full model, making it memory efficient.
Fast Training
Training completes in under 10 minutes on a single 3090 GPU with minimal computational cost.
LoRA Configuration
from peft import LoraConfig
lora_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules="all-linear"
)
Training Configuration
from trl import SFTConfig, SFTTrainer
training_args = SFTConfig(
output_dir="./gemma4-emotion-lora",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
gradient_accumulation_steps=2,
learning_rate=1e-4,
weight_decay=0.01,
lr_scheduler_type="linear",
warmup_steps=50,
num_train_epochs=1,
logging_steps=50,
eval_strategy="steps",
metric_for_best_model="eval_loss",
greater_is_better=False,
gradient_checkpointing=True,
bf16=True,
fp16=False,
tf32=True,
max_length=256,
packing=False,
completion_only_loss=True,
remove_unused_columns=False,
dataloader_num_workers=2,
optim="paged_adamw_8bit",
report_to="none",
)
Now we make sure the base model is ready and initialize the trainer. This step attaches the LoRA adapters to the base model and prepares the supervised fine-tuning trainer.
Initialize the Trainer
Attach LoRA adapters and prepare the SFT trainer with your formatted training and validation splits.
from peft import PeftModel
if isinstance(base_model, PeftModel):
base_model = base_model.unload()
base_model.config.use_cache = False
trainer = SFTTrainer(
model=base_model,
train_dataset=sft_dataset["train"],
eval_dataset=sft_dataset["validation"],
peft_config=lora_config,
args=training_args,
processing_class=processor,
)
Training Time
In this run, training takes almost 9 minutes on a single 3090 GPU. Both the training loss and validation loss keep decreasing over time, which is a good sign.
Once training is complete, we can save the adapter and tokenizer locally, and push the model to the Hugging Face Hub.
7. Evaluate the Fine-Tuned Model
Now that training is complete, the final step is to evaluate the fine-tuned model on the same test split and compare the results with the base model.
These results are clearly stronger than the baseline. After fine-tuning, the model reaches 77.25% accuracy and a macro F1 score of 0.698.
| Metric | Pre Fine-Tuning | Post Fine-Tuning | Improvement |
|---|---|---|---|
| Accuracy | 58.25% | 77.25% | +19% |
| Macro F1 | 0.42 | 0.70 | +0.28 |
| Invalid Predictions | 33 | 20 | -13 |
Final Thoughts
Fine-tuning Gemma 4 is very sensitive to setup, especially the prompt structure and training arguments. If the prompt format is wrong, or you do not use the proper template consistently, the model may go through training without actually learning the task well.
Another important lesson is max_length. If you reduce it too much, especially below around 125, the model may not learn the pattern properly at all.
To improve the results further, a good next step would be to fine-tune on the full dataset and train for at least 3 epochs instead of just one. That would give the model more examples to learn from and more time to adapt.
Frequently Asked Questions
What is LoRA fine-tuning?
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that attaches small trainable adapter weights to the base model instead of updating all model parameters, making training faster and more memory-efficient.
Why use 4-bit quantization?
4-bit quantization reduces the model's memory footprint by approximately 75%, allowing larger models to run on consumer GPUs like the 3090 with limited VRAM.
How long does training take?
Training takes approximately 9 minutes on a single NVIDIA 3090 GPU with 4,000 training examples and 1 epoch. This makes it practical for experimentation and quick iterations.
What is the emotion dataset?
The dair-ai/emotion dataset from Hugging Face contains text snippets labeled with six emotions: sadness, joy, love, anger, fear, and surprise. It is commonly used for text classification benchmarks.
How much does the model improve?
The fine-tuned model shows a significant improvement from 58% to 77% accuracy (+19%) and F1 score from 0.42 to 0.70 (+0.28), demonstrating the effectiveness of LoRA fine-tuning.
Need Help with AI Implementation?
Our experts can help you implement AI solutions like fine-tuning Gemma 4 for your specific use case. Get a free consultation to discuss your project requirements.
