Google has just introduced Gemma 4, describing it as its most intelligent open model family so far, built for strong reasoning and agentic workflows. Gemma models are designed to be flexible across environments, with official support and tooling for local development, cloud deployment, and model customization, which makes them a strong choice for fine-tuning projects.

What You'll Learn:

Setting up a Runpod environment with GPU access
Loading and preparing the emotion classification dataset
Formatting data for Gemma 4 supervised fine-tuning
Loading Gemma 4 with 4-bit quantization
Evaluating the base model before training
Fine-tuning with LoRA adapters
Evaluating and comparing results

1. Setting Up the Environment

Start by launching a new Runpod instance, and make sure your account has at least $5 in credit before you begin. For this tutorial, choose a 3090 GPU pod and select the latest PyTorch template.

Before deploying, open the template settings and make a few updates. Increase both the container disk and volume disk to 40 GB so you have enough space for the model, dataset, cached files, and training checkpoints.

You should also add your Hugging Face token as an environment variable. You can generate this token from Settings > Access Tokens in your Hugging Face account.

Terminal

%%capture
!pip install -U transformers accelerate datasets trl peft bitsandbytes scikit-learn huggingface_hub

These packages will cover the full workflow, including loading the dataset, preparing the model, fine-tuning, and evaluation.

Python

import os
from huggingface_hub import login

hf_token = os.environ.get("HF_TOKEN")
if not hf_token:
    raise ValueError("Set HF_TOKEN in the RunPod environment before running this notebook.")

login(token=hf_token)
print("Logged in to Hugging Face.")

2. Load and Prepare the Emotion Dataset

Now that the environment is ready, the next step is to load the emotion dataset from Hugging Face and prepare smaller splits for training and evaluation.

For this tutorial, we are not using the full dataset. Instead, we create limited train, validation, and test splits so the fine-tuning process stays faster and easier to run on a single GPU.

Python

from datasets import load_dataset, DatasetDict

TRAIN_LIMIT = 4000    
VALIDATION_LIMIT = 400  
TEST_LIMIT = 400          
EVAL_LIMIT = 400        

raw_dataset = load_dataset("dair-ai/emotion")

def maybe_limit(split, limit):
    split = split.shuffle(seed=42)
    if limit is None:
        return split
    return split.select(range(min(limit, len(split))))

dataset = DatasetDict({
    "train": maybe_limit(raw_dataset["train"], TRAIN_LIMIT),
    "validation": maybe_limit(raw_dataset["validation"], VALIDATION_LIMIT),
    "test": maybe_limit(raw_dataset["test"], TEST_LIMIT),
})

dataset

The final dataset contains 4,000 training examples, 400 validation examples, and 400 test examples. Next, we look at the label names stored in the dataset.

Python

label_names = dataset["train"].features["label"].names
label_names

This shows that the task has six emotion categories: sadness, joy, love, anger, fear, and surprise.

3. Formatting Data for Gemma 4 Fine-Tuning

Before we can fine-tune the model, we need to convert the dataset into the format Gemma 4 will use during training.

Instead of passing only raw text and labels, we structure each example as a short chat interaction with a system message, a user message, and the expected assistant response.

Python

SYSTEM_PROMPT = """You are an emotion classification assistant.
Read the user's text and answer with exactly one label.
Only choose from: sadness, joy, love, anger, fear, surprise.
Return only the label and nothing else."""

The system prompt tells the model exactly what task it should perform. In this case, we want the model to act as an emotion classification assistant and return only one of the six allowed labels.

Python

def to_prompt_completion(example):
    text = example["text"]
    label = label_names[example["label"]]
    return {
        "prompt": [
            {
                "role": "system",
                "content": SYSTEM_PROMPT,
            },
            {
                "role": "user",
                "content": f"Classify the emotion of this text:\n\n{text}",
            },
        ],
        "completion": [
            {
                "role": "assistant",
                "content": label,
            }
        ],
    }

sft_dataset = dataset.map(to_prompt_completion, remove_columns=dataset["train"].column_names)

4. Load Gemma 4 E4B-it With 4-Bit Quantization

Now we can load Gemma 4 E4B-it and prepare it for fine-tuning. Since this is a relatively large model, we load it with 4-bit quantization to reduce memory usage and make it easier to run on a 3090 GPU.

We also use bfloat16 as the compute type, which helps keep the setup efficient.

Python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

MODEL_ID = "google/gemma-4-E4B-it"
MODEL_DTYPE = torch.bfloat16
USE_4BIT = True

if torch.cuda.is_available():
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True

processor = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
if processor.pad_token is None:
    processor.pad_token = processor.eos_token

Python

bnb_config = None
model_kwargs = {
    "device_map": "auto",
}
if USE_4BIT:
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=MODEL_DTYPE,
    )
    model_kwargs["quantization_config"] = bnb_config
else:
    model_kwargs["torch_dtype"] = MODEL_DTYPE

base_model = AutoModelForCausalLM.from_pretrained(MODEL_ID, **model_kwargs)
base_model.config.use_cache = False
base_model.config.pad_token_id = processor.pad_token_id
base_model.config.bos_token_id = processor.bos_token_id
base_model.config.eos_token_id = processor.eos_token_id
base_model.generation_config.pad_token_id = processor.pad_token_id
base_model.generation_config.bos_token_id = processor.bos_token_id
base_model.generation_config.eos_token_id = processor.eos_token_id

print(f"Base model loaded with 4-bit={USE_4BIT} and dtype={MODEL_DTYPE}.")

5. Evaluate the Base Model

Before fine-tuning, it is useful to evaluate the base model first so we have a clear baseline to compare against later.

In this section, we define a few helper functions that generate predictions, extract valid emotion labels, and run evaluation on the test split.

Important

The baseline results show that the untuned model already performs reasonably well, but there is still room for improvement. The accuracy is around 58.25%, and the macro F1 score is around 0.42.

6. Fine-Tune Gemma 4 With LoRA

Now that we have the baseline results, we can fine-tune Gemma 4 using LoRA.

LoRA is a parameter-efficient fine-tuning method, which means we do not update the full model. Instead, we attach a small number of trainable adapter weights on top of the base model. This makes training much lighter and more practical on a single GPU.

Parameter Efficient

LoRA only trains a small number of adapter weights instead of the full model, making it memory efficient.

Fast Training

Training completes in under 10 minutes on a single 3090 GPU with minimal computational cost.

LoRA Configuration

Python

from peft import LoraConfig

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules="all-linear"
)

Training Configuration

Python

from trl import SFTConfig, SFTTrainer

training_args = SFTConfig(
    output_dir="./gemma4-emotion-lora",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=2,
    learning_rate=1e-4,
    weight_decay=0.01,
    lr_scheduler_type="linear",
    warmup_steps=50,
    num_train_epochs=1,
    logging_steps=50,
    eval_strategy="steps",
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    gradient_checkpointing=True,
    bf16=True,
    fp16=False,
    tf32=True,
    max_length=256,
    packing=False,
    completion_only_loss=True,
    remove_unused_columns=False,
    dataloader_num_workers=2,
    optim="paged_adamw_8bit",
    report_to="none",
)

Now we make sure the base model is ready and initialize the trainer. This step attaches the LoRA adapters to the base model and prepares the supervised fine-tuning trainer.

Initialize the Trainer

Attach LoRA adapters and prepare the SFT trainer with your formatted training and validation splits.

Python

from peft import PeftModel

if isinstance(base_model, PeftModel):
    base_model = base_model.unload()
    base_model.config.use_cache = False

trainer = SFTTrainer(
    model=base_model,
    train_dataset=sft_dataset["train"],
    eval_dataset=sft_dataset["validation"],
    peft_config=lora_config,
    args=training_args,
    processing_class=processor,
)

Training Time

In this run, training takes almost 9 minutes on a single 3090 GPU. Both the training loss and validation loss keep decreasing over time, which is a good sign.

Once training is complete, we can save the adapter and tokenizer locally, and push the model to the Hugging Face Hub.

7. Evaluate the Fine-Tuned Model

Now that training is complete, the final step is to evaluate the fine-tuned model on the same test split and compare the results with the base model.

These results are clearly stronger than the baseline. After fine-tuning, the model reaches 77.25% accuracy and a macro F1 score of 0.698.

Metric	Pre Fine-Tuning	Post Fine-Tuning	Improvement
Accuracy	58.25%	77.25%	+19%
Macro F1	0.42	0.70	+0.28
Invalid Predictions	33	20	-13

Final Thoughts

Fine-tuning Gemma 4 is very sensitive to setup, especially the prompt structure and training arguments. If the prompt format is wrong, or you do not use the proper template consistently, the model may go through training without actually learning the task well.

Another important lesson is max_length. If you reduce it too much, especially below around 125, the model may not learn the pattern properly at all.

To improve the results further, a good next step would be to fine-tune on the full dataset and train for at least 3 epochs instead of just one. That would give the model more examples to learn from and more time to adapt.

Frequently Asked Questions

What is LoRA fine-tuning?

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that attaches small trainable adapter weights to the base model instead of updating all model parameters, making training faster and more memory-efficient.

Why use 4-bit quantization?

4-bit quantization reduces the model's memory footprint by approximately 75%, allowing larger models to run on consumer GPUs like the 3090 with limited VRAM.

How long does training take?

Training takes approximately 9 minutes on a single NVIDIA 3090 GPU with 4,000 training examples and 1 epoch. This makes it practical for experimentation and quick iterations.

What is the emotion dataset?

The dair-ai/emotion dataset from Hugging Face contains text snippets labeled with six emotions: sadness, joy, love, anger, fear, and surprise. It is commonly used for text classification benchmarks.

How much does the model improve?

The fine-tuned model shows a significant improvement from 58% to 77% accuracy (+19%) and F1 score from 0.42 to 0.70 (+0.28), demonstrating the effectiveness of LoRA fine-tuning.

Need Help with AI Implementation?

Our experts can help you implement AI solutions like fine-tuning Gemma 4 for your specific use case. Get a free consultation to discuss your project requirements.

What You'll Learn:

Setting up a Runpod environment with GPU access
Loading and preparing the emotion classification dataset
Formatting data for Gemma 4 supervised fine-tuning
Loading Gemma 4 with 4-bit quantization
Evaluating the base model before training
Fine-tuning with LoRA adapters
Evaluating and comparing results

1. Setting Up the Environment

Start by launching a new Runpod instance, and make sure your account has at least $5 in credit before you begin. For this tutorial, choose a 3090 GPU pod and select the latest PyTorch template.

You should also add your Hugging Face token as an environment variable. You can generate this token from Settings > Access Tokens in your Hugging Face account.

Terminal

%%capture
!pip install -U transformers accelerate datasets trl peft bitsandbytes scikit-learn huggingface_hub

These packages will cover the full workflow, including loading the dataset, preparing the model, fine-tuning, and evaluation.

Python

import os
from huggingface_hub import login

hf_token = os.environ.get("HF_TOKEN")
if not hf_token:
    raise ValueError("Set HF_TOKEN in the RunPod environment before running this notebook.")

login(token=hf_token)
print("Logged in to Hugging Face.")

2. Load and Prepare the Emotion Dataset

Now that the environment is ready, the next step is to load the emotion dataset from Hugging Face and prepare smaller splits for training and evaluation.

For this tutorial, we are not using the full dataset. Instead, we create limited train, validation, and test splits so the fine-tuning process stays faster and easier to run on a single GPU.

Python

from datasets import load_dataset, DatasetDict

TRAIN_LIMIT = 4000    
VALIDATION_LIMIT = 400  
TEST_LIMIT = 400          
EVAL_LIMIT = 400        

raw_dataset = load_dataset("dair-ai/emotion")

def maybe_limit(split, limit):
    split = split.shuffle(seed=42)
    if limit is None:
        return split
    return split.select(range(min(limit, len(split))))

dataset = DatasetDict({
    "train": maybe_limit(raw_dataset["train"], TRAIN_LIMIT),
    "validation": maybe_limit(raw_dataset["validation"], VALIDATION_LIMIT),
    "test": maybe_limit(raw_dataset["test"], TEST_LIMIT),
})

dataset

The final dataset contains 4,000 training examples, 400 validation examples, and 400 test examples. Next, we look at the label names stored in the dataset.

Python

label_names = dataset["train"].features["label"].names
label_names

This shows that the task has six emotion categories: sadness, joy, love, anger, fear, and surprise.

3. Formatting Data for Gemma 4 Fine-Tuning

Before we can fine-tune the model, we need to convert the dataset into the format Gemma 4 will use during training.

Instead of passing only raw text and labels, we structure each example as a short chat interaction with a system message, a user message, and the expected assistant response.

Python

SYSTEM_PROMPT = """You are an emotion classification assistant.
Read the user's text and answer with exactly one label.
Only choose from: sadness, joy, love, anger, fear, surprise.
Return only the label and nothing else."""

The system prompt tells the model exactly what task it should perform. In this case, we want the model to act as an emotion classification assistant and return only one of the six allowed labels.

Python

def to_prompt_completion(example):
    text = example["text"]
    label = label_names[example["label"]]
    return {
        "prompt": [
            {
                "role": "system",
                "content": SYSTEM_PROMPT,
            },
            {
                "role": "user",
                "content": f"Classify the emotion of this text:\n\n{text}",
            },
        ],
        "completion": [
            {
                "role": "assistant",
                "content": label,
            }
        ],
    }

sft_dataset = dataset.map(to_prompt_completion, remove_columns=dataset["train"].column_names)

4. Load Gemma 4 E4B-it With 4-Bit Quantization

We also use bfloat16 as the compute type, which helps keep the setup efficient.

Python

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

MODEL_ID = "google/gemma-4-E4B-it"
MODEL_DTYPE = torch.bfloat16
USE_4BIT = True

if torch.cuda.is_available():
    torch.backends.cuda.matmul.allow_tf32 = True
    torch.backends.cudnn.allow_tf32 = True

processor = AutoTokenizer.from_pretrained(MODEL_ID, use_fast=True)
if processor.pad_token is None:
    processor.pad_token = processor.eos_token

Python

bnb_config = None
model_kwargs = {
    "device_map": "auto",
}
if USE_4BIT:
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=MODEL_DTYPE,
    )
    model_kwargs["quantization_config"] = bnb_config
else:
    model_kwargs["torch_dtype"] = MODEL_DTYPE

base_model = AutoModelForCausalLM.from_pretrained(MODEL_ID, **model_kwargs)
base_model.config.use_cache = False
base_model.config.pad_token_id = processor.pad_token_id
base_model.config.bos_token_id = processor.bos_token_id
base_model.config.eos_token_id = processor.eos_token_id
base_model.generation_config.pad_token_id = processor.pad_token_id
base_model.generation_config.bos_token_id = processor.bos_token_id
base_model.generation_config.eos_token_id = processor.eos_token_id

print(f"Base model loaded with 4-bit={USE_4BIT} and dtype={MODEL_DTYPE}.")

5. Evaluate the Base Model

Before fine-tuning, it is useful to evaluate the base model first so we have a clear baseline to compare against later.

In this section, we define a few helper functions that generate predictions, extract valid emotion labels, and run evaluation on the test split.

Important

6. Fine-Tune Gemma 4 With LoRA

Now that we have the baseline results, we can fine-tune Gemma 4 using LoRA.

Parameter Efficient

LoRA only trains a small number of adapter weights instead of the full model, making it memory efficient.

Fast Training

Training completes in under 10 minutes on a single 3090 GPU with minimal computational cost.

LoRA Configuration

Python

from peft import LoraConfig

lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules="all-linear"
)

Training Configuration

Python

from trl import SFTConfig, SFTTrainer

training_args = SFTConfig(
    output_dir="./gemma4-emotion-lora",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=2,
    learning_rate=1e-4,
    weight_decay=0.01,
    lr_scheduler_type="linear",
    warmup_steps=50,
    num_train_epochs=1,
    logging_steps=50,
    eval_strategy="steps",
    metric_for_best_model="eval_loss",
    greater_is_better=False,
    gradient_checkpointing=True,
    bf16=True,
    fp16=False,
    tf32=True,
    max_length=256,
    packing=False,
    completion_only_loss=True,
    remove_unused_columns=False,
    dataloader_num_workers=2,
    optim="paged_adamw_8bit",
    report_to="none",
)

Now we make sure the base model is ready and initialize the trainer. This step attaches the LoRA adapters to the base model and prepares the supervised fine-tuning trainer.

Initialize the Trainer

Attach LoRA adapters and prepare the SFT trainer with your formatted training and validation splits.

Python

from peft import PeftModel

if isinstance(base_model, PeftModel):
    base_model = base_model.unload()
    base_model.config.use_cache = False

trainer = SFTTrainer(
    model=base_model,
    train_dataset=sft_dataset["train"],
    eval_dataset=sft_dataset["validation"],
    peft_config=lora_config,
    args=training_args,
    processing_class=processor,
)

Training Time

In this run, training takes almost 9 minutes on a single 3090 GPU. Both the training loss and validation loss keep decreasing over time, which is a good sign.

Once training is complete, we can save the adapter and tokenizer locally, and push the model to the Hugging Face Hub.

7. Evaluate the Fine-Tuned Model

Now that training is complete, the final step is to evaluate the fine-tuned model on the same test split and compare the results with the base model.

These results are clearly stronger than the baseline. After fine-tuning, the model reaches 77.25% accuracy and a macro F1 score of 0.698.

Metric	Pre Fine-Tuning	Post Fine-Tuning	Improvement
Accuracy	58.25%	77.25%	+19%
Macro F1	0.42	0.70	+0.28
Invalid Predictions	33	20	-13

Final Thoughts

Another important lesson is max_length. If you reduce it too much, especially below around 125, the model may not learn the pattern properly at all.

Frequently Asked Questions

What is LoRA fine-tuning?

Why use 4-bit quantization?

4-bit quantization reduces the model's memory footprint by approximately 75%, allowing larger models to run on consumer GPUs like the 3090 with limited VRAM.

How long does training take?

Training takes approximately 9 minutes on a single NVIDIA 3090 GPU with 4,000 training examples and 1 epoch. This makes it practical for experimentation and quick iterations.

What is the emotion dataset?

The dair-ai/emotion dataset from Hugging Face contains text snippets labeled with six emotions: sadness, joy, love, anger, fear, and surprise. It is commonly used for text classification benchmarks.

How much does the model improve?

The fine-tuned model shows a significant improvement from 58% to 77% accuracy (+19%) and F1 score from 0.42 to 0.70 (+0.28), demonstrating the effectiveness of LoRA fine-tuning.

Need Help with AI Implementation?

Our experts can help you implement AI solutions like fine-tuning Gemma 4 for your specific use case. Get a free consultation to discuss your project requirements.

How to Fine-Tune Gemma 4 on Emotions: A Complete Step-by-Step Guide

1. Setting Up the Environment

2. Load and Prepare the Emotion Dataset

3. Formatting Data for Gemma 4 Fine-Tuning

4. Load Gemma 4 E4B-it With 4-Bit Quantization

5. Evaluate the Base Model

6. Fine-Tune Gemma 4 With LoRA

Parameter Efficient

Fast Training

LoRA Configuration

Training Configuration

Initialize the Trainer

7. Evaluate the Fine-Tuned Model

Final Thoughts

Frequently Asked Questions

What is LoRA fine-tuning?

Why use 4-bit quantization?

How long does training take?

What is the emotion dataset?

How much does the model improve?

Need Help with AI Implementation?

Need this implemented in your project?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief

How to Fine-Tune Gemma 4 on Emotions: A Complete Step-by-Step Guide

1. Setting Up the Environment

2. Load and Prepare the Emotion Dataset

3. Formatting Data for Gemma 4 Fine-Tuning

4. Load Gemma 4 E4B-it With 4-Bit Quantization

5. Evaluate the Base Model

6. Fine-Tune Gemma 4 With LoRA

Parameter Efficient

Fast Training

LoRA Configuration

Training Configuration

Initialize the Trainer

7. Evaluate the Fine-Tuned Model

Final Thoughts

Frequently Asked Questions

What is LoRA fine-tuning?

Why use 4-bit quantization?

How long does training take?

What is the emotion dataset?

How much does the model improve?

Need Help with AI Implementation?

Need this implemented in your project?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief