Practical LLM Fine-Tuning: The Good, The Bad and The Ugly

Feb 28, 2025

Written by Dan Kenan

Fine-Tuning

Introduction

Fine-tuning is the process of customizing a pre-trained large language model (LLM) using domain-specific data. By training a model on curated examples, it can produce responses that better align with particular use cases, improving accuracy, consistency, and efficiency.

Fine-tuning works by adjusting the model weights using supervised learning on labeled datasets. During this process, the model gradually modifies its parameters based on error corrections, enabling it to generate more accurate and contextually relevant responses for the specific task. The process refines how the model interprets and generates text, ensuring responses adhere to specific knowledge, tone, or structure.

When Fine-Tuning is Beneficial: Two Examples

Example 1: Technical Support Chatbot

Before Fine-Tuning:

User: How do I reset my router?
GPT-4o: To reset your router, try unplugging it for a few minutes, then plug it back in. If that doesn’t work, refer to your router manual.

Fine-Tuning Data:

{
  "messages": [
    {
      "role": "user",
      "content": "How do I reset my router?"
    },
    {
      "role": "assistant",
      "content": "To reset your ACME-2000 router, press and hold the reset button for 10 seconds until the lights blink. Then, log in using the default admin credentials from your manual."
    }
  ]
}

After Fine-Tuning:

User: How do I reset my router?
GPT-4o Fine-Tuned: To reset your ACME-2000 router, press and hold the reset button for 10 seconds until the lights blink. Then, log in using the default admin credentials.

Example 2: Legal Document Summary

Before Fine-Tuning:

User: Summarize this contract clause.
GPT-4o: This clause discusses liability limitations and obligations.

Fine-Tuning Data:

{
  "messages": [
    {
      "role": "user",
      "content": "Summarize this contract clause on liability."
    },
    {
      "role": "assistant",
      "content": "This clause limits the company's liability to direct damages and excludes responsibility for consequential damages."
    }
  ]
}

After Fine-Tuning:

User: Summarize this contract clause.
GPT-4o Fine-Tuned: The clause limits liability to direct damages, excluding consequential damages and capping claims at the total contract value.

Fine-Tuning - Practical Examples

Via the OpenAI Dashboard

OpenAI provides a fine-tuning interface on their website that allows users to fine-tune models without writing code:

Go to the OpenAI Dashboard
Navigate to the "Fine-Tuning" section.
Upload a JSONL file with training data.
Start a fine-tuning job and monitor its status.
Once complete, note the fine-tuned model ID and use it in API calls.

Creating a Fine-Tuned Model Using Code

Assuming your training data is as follows:

{"messages": [{"role": "system", "content": "You are a networking expert specializing in routers and internet troubleshooting."}, {"role": "user", "content": "How do I reset my router?"}, {"role": "assistant", "content": "To reset your router, press and hold the reset button for 10 seconds until the lights blink. Then, log in with the default credentials provided in your router's manual."}]}
{"messages": [{"role": "system", "content": "You are a networking expert specializing in routers and internet troubleshooting."}, {"role": "user", "content": "Why is my Wi-Fi slow even when I'm close to the router?"}, {"role": "assistant", "content": "Slow Wi-Fi near the router can be caused by interference from other devices, outdated firmware, or bandwidth congestion. Try switching to a less crowded Wi-Fi channel, updating firmware, or reducing connected devices."}]}
{"messages": [{"role": "system", "content": "You are a networking expert specializing in routers and internet troubleshooting."}, {"role": "user", "content": "How do I change my router's Wi-Fi password?"}, {"role": "assistant", "content": "To change your Wi-Fi password, log into your router\u2019s admin panel (usually via 192.168.1.1 or 192.168.0.1), go to the wireless settings section, update the password, and save changes."}]}
{"messages": [{"role": "system", "content": "You are a networking expert specializing in routers and internet troubleshooting."}, {"role": "user", "content": "What's the difference between 2.4GHz and 5GHz Wi-Fi?"}, {"role": "assistant", "content": "2.4GHz Wi-Fi provides a longer range but lower speeds, while 5GHz Wi-Fi offers faster speeds with a shorter range. Use 2.4GHz for better coverage and 5GHz for high-speed performance in close proximity."}]}

Here's how you can fine-tune an OpenAI model in Python:

import openai
import json
import time

openai.api_key = "your-api-key"

# Prepare training data
training_data = [
  {"messages": [
      {"role": "system", "content": "You are a networking expert specializing in routers and internet troubleshooting."},
      {"role": "user", "content": "How do I reset my router?"},
      {"role": "assistant", "content": "To reset your router, press and hold the reset button for 10 seconds until the lights blink. Then, log in with the default credentials provided in your router's manual."}
  ]}
  {"messages": [
      {"role": "system", "content": "You are a networking expert specializing in routers and internet troubleshooting."},
      {"role": "user", "content": "Why is my Wi-Fi slow even when I'm close to the router?"},
      {"role": "assistant", "content": "Slow Wi-Fi near the router can be caused by interference from other devices, outdated firmware, or bandwidth congestion. Try switching to a less crowded Wi-Fi channel, updating firmware, or reducing connected devices."}
  ]}
  {"messages": [
      {"role": "system", "content": "You are a networking expert specializing in routers and internet troubleshooting."},
      {"role": "user", "content": "How do I change my router's Wi-Fi password?"},
      {"role": "assistant", "content": "To change your Wi-Fi password, log into your router’s admin panel (usually via 192.168.1.1 or 192.168.0.1), go to the wireless settings section, update the password, and save changes."}
  ]}
  {"messages": [
      {"role": "system", "content": "You are a networking expert specializing in routers and internet troubleshooting."},
      {"role": "user", "content": "What's the difference between 2.4GHz and 5GHz Wi-Fi?"},
      {"role": "assistant", "content": "2.4GHz Wi-Fi provides a longer range but lower speeds, while 5GHz Wi-Fi offers faster speeds with a shorter range. Use 2.4GHz for better coverage and 5GHz for high-speed performance in close proximity."}
  ]}
]

# Save to JSONL file
jsonl_file = "training_data.jsonl"
with open(jsonl_file, "w") as f:
    for entry in training_data:
        f.write(json.dumps(entry) + "\n")

# Print the JSONL contents
with open(jsonl_file, "r") as f:
    print(f.read())

# Upload training data
response = openai.File.create(
    file=open(jsonl_file, "rb"), purpose="fine-tune"
)
file_id = response["id"]

# Fine-tune the model
fine_tune_response = openai.FineTuning.create(
    training_file=file_id, model="gpt-4o"
)
fine_tune_id = fine_tune_response["id"]
print(f"Fine-tuning job created: {fine_tune_id}")

# Check fine-tuning job status
while True:
    status_response = openai.FineTuning.retrieve(fine_tune_id)
    status = status_response["status"]
    print(f"Fine-tuning status: {status}")
    if status == "succeeded":
        model_id = status_response["fine_tuned_model"]
        print(f"Fine-tuned model ready: {model_id}")
        break
    elif status == "failed":
        print("Fine-tuning failed.")
        break
    time.sleep(30)  # Wait before checking again

Using a Fine-Tuned Model

Once your fine-tuning job is complete and a new model is available, you need to reference it in your API calls. Instead of using gpt-4o, use the newly generated fine-tuned model ID.

Example Usage:

completion = openai.Completion.create(
    model="your-fine-tuned-model-id",
    prompt="Summarize this contract clause."
)
print(completion["choices"][0]["text"])

Ensure that your application dynamically retrieves and stores the fine-tuned model ID after the job is complete. It is recommended to securely store the model ID in a configuration file, environment variable, or database to prevent unauthorized access and ensure smooth deployment.

Considerations of Fine-Tuning

Benefits of Fine-Tuning

Lower costs: Fine-tuned models require fewer tokens per response, reducing costs by 30-50% depending on query complexity.
Faster response time: Tailored answers reduce unnecessary computation, leading to a 20-40% decrease in response latency.
Higher precision: Fine-tuned models improve accuracy by up to 60% for domain-specific tasks such as legal and medical text processing .
Better customer experience: Fine-tuned chatbots achieve up to 80% better user satisfaction by providing more context-aware and concise responses.
Distillation Benefits: Smaller models distilled from fine-tuned ones retain up to 95% of the original model’s accuracy while reducing inference costs.
Improved compliance: Fine-tuning ensures industry-specific regulatory adherence, crucial in fields like finance and healthcare where compliance rates improve by 40%.

Drawbacks & Alternatives

Time and Cost: Training costs can be high and require maintenance (depends on the case, see table bellow).
Effort & Complexity: Gathering data, refining prompts, and managing model versions require effort.
Risk of Overfitting: If the fine-tuning dataset is too small or too narrow, the model may become overly specialized and lose generalization ability.
Becoming Too Specific: Excessive fine-tuning can lead to responses that are too rigid and may not adapt well to slightly different queries.
Overtrained Models: If a model is trained on too much data from a single domain, it may lose its ability to generalize across broader contexts.
Alternative - Few-Shot Prompting: Instead of fine-tuning, providing a few in-context examples in the prompt can often achieve similar accuracy without training costs.
Alternative - Retrieval-Augmented Generation (RAG): RAG uses an external knowledge source to dynamically retrieve information during inference, reducing the need for fine-tuning while maintaining flexibility.

When to Use Fine-Tuning vs. Alternatives

Scenario	Fine-Tuning	Few-Shot	Prompt Engineering	RAG
Large dataset, highly repetitive queries	✅	❌	❌	❌
Domain-specific structured responses	✅	✅	❌	✅
Few examples, varying queries	❌	✅	✅	✅
Requires immediate updates	❌	✅	✅	✅
Highly dynamic knowledge base	❌	✅	✅	✅

Number of Examples Needed for Fine-Tuning

The number of examples required for fine-tuning depends on the complexity of the task and the degree of customization needed. Below is a general guideline for different levels and purposes of fine-tuning:

Fine-Tuning Level	Number of Examples	Purpose
Minimal Fine-Tuning	50-500	Slight behavioral adjustments, tone modifications
Domain-Specific	500-5,000	Industry or business-specific responses
Highly Specialized	5,000-50,000	Complex workflows, legal, medical, or financial applications
Expert-Level Fine-Tuning	50,000+	Large-scale fine-tuning for proprietary datasets and unique capabilities

Note: A lower number of examples can work for small tweaks, but more extensive modifications require significantly larger datasets.

Comparing Fine-Tuning Across Models

Fine-tuning is not exclusive to OpenAI. Here's how other models compare:

Model	Fine-Tuning Support	Major Differences
OpenAI GPT-4o	✅	API-driven, structured fine-tuning
Gemini 2.0	✅	Focused on multimodal capabilities
Claude 3.5	✅	Emphasizes safety & alignment tuning

Summary

Fine-tuning offers a powerful way to enhance LLM responses, improving accuracy, efficiency, and contextual relevance. In many cases it can reduce costs significantly in the long run (can reduce per-query costs by up to 50%, and even more so in the case of distillation).
However, it comes with trade-offs such as complexity, less flexibility, may required significant training costs and potential even have a negative effect on quality such as in the case of overfitting.
We hope this article provided some insight into what you need to look for when choosing between Fine-Tuning and any of its alternative approaches like Few-Shot learning or retrieval-augmented generation (RAG) may provide similar benefits with lower overhead. Choosing between fine-tuning and these alternatives depends on factors such as dataset size, adaptability needs, and response latency. It should be considered carefully against alternatives like Few-Shot learning and prompt engineering. With tools from OpenAI, LangChain, and others, managing fine-tuning data is easier, making it a viable approach when precision and efficiency are required.

Tags in this article

GenAI

LLM

Copy this link

https://codygo.com/blog/practical-fine-tuning-llm-the-good-the-bad-and-the-ugly