Quick answer

LM Studio runs open LLMs (Llama, Mistral, Qwen) locally on your own machine. Download the app, pick a model from its catalog, load it, then chat in the UI or serve an OpenAI-compatible API on localhost. Best for private, offline, zero-cost inference. Step-by-step setup, model picks, and the local API config follow.

Related guides

LM Studio is a cross-platform application that lets you download and run large language models locally on your machine so that your data never leaks to external servers. In this complete tutorial, you will learn how to install LM Studio, download models, chat with documents using built-in RAG, and set up a local API server.

What You'll Learn:

Install and set up LM Studio on your computer
Download and run LLMs locally with privacy control
Chat with documents using built-in RAG support
Configure model parameters for optimal performance
Set up a local API server for application integration

What is LM Studio?

LM Studio is a cross-platform application that lets you download and run large language models locally on your machine. When everything runs on your computer, your prompts and data stay within your environment, giving you more control and better privacy.

It comes with a built-in model browser where you can search, browse, and download models directly from Hugging Face. You can download models including different versions of DeepSeek, Llama, Gemma, Phi, or Mistral. No extra setup required.

LM Studio is also great for beginners who are not comfortable working with command-line prompts. It gives you a user-friendly interface where you can select a model, adjust configuration, and start chatting right away.

You can also upload local files and chat with them. LM Studio can attach .docx, .pdf, and .txt files to chat sessions. If a document fits in context, it is added in full, and if it is very long, LM Studio uses retrieval-augmented generation (RAG) to pull relevant information from those files.

LM Studio vs Ollama

Feature	LM Studio	Ollama
Interface	GUI-first, user-friendly	CLI-first, terminal-based
Built-in RAG	Yes, no extra setup	Requires external tools
MCP support	Built-in	Limited / not native
Model downloads	Hugging Face in-app	Using ollama pull commands
Ease of setup	Very beginner-friendly	Slight learning curve

System Requirements and Model Choice

Before downloading models in LM Studio, understand what your system can handle. The model you choose directly depends on your available RAM, and picking the wrong one can slow things down or make the app unusable.

RAM Requirements by Model Size

RAM	What You Can Run
8GB	Small models (1B-4B parameters)
16GB	Mid-sized models (7B-9B parameters)
32GB+	Larger models (13B and above)

GPU is optional but makes a noticeable difference. If you have one, model responses become faster and smoother. NVIDIA GPUs with CUDA support work best, Apple Silicon uses Metal effectively, and AMD has partial support.

Recommended Models by Hardware

RAM/VRAM	Recommended Models
8GB	Qwen 2.5 3B/4B, Phi-3 Mini (3.8B), Gemma 2 2B
16GB	Llama 3 8B, Gemma 2 9B, Mistral 7B, Qwen 2.5 7B
24GB	Llama 3.1 8B (higher quality), Mixtral 8x7B (quantized), Qwen 2.5 14B
32GB+	Llama 3.1 70B (heavily quantized), Qwen 2.5 32B, Mixtral variants

Understanding Quantization

You will notice different versions of the same model with labels like Q4_K_M or Q8_0. This refers to quantization levels, which tell how the model is compressed.

Q4 (lower quantization): Reduces memory usage and runs faster, but you lose some quality
Q8 (higher quantization): Better output quality, but requires more RAM and runs slower

If you are unsure, Q4 or Q5 is usually a safe place to start, especially on a 16GB setup.

GGUF Format

Most models in LM Studio are available in GGUF format. GGUF stands for GPT-Generated Unified format, a binary format to store and run LLMs efficiently on consumer-grade hardware. It maps high-precision weights to lower-bit integers and packages everything into a single optimized file.

Installing LM Studio

To get started with LM Studio, head over to the official website and download the app. The website automatically detects your operating system and offers the relevant version for Windows, Mac, or Linux.

You might be asked to allow permissions depending on your system settings. On Mac, it becomes available as an application right after opening the downloaded installer.

Downloading Your First Model

When you open LM Studio for the first time, you land on a clean interface with the model browser. You can search for models, explore options, and begin downloading immediately.

Browsing the Discover Tab

Open LM Studio

Open LM Studio and click the search icon from the left sidebar to access the model browser.

Search for Models

Search for specific models, filter by size, and explore different options. Each model comes with a model card showing size, capabilities, and recommended use cases.

Download a Model

Download a model like Qwen 2.5 7B (Q4_K_M) for a 16GB system. It strikes a good balance between performance and quality.

Chatting with a Local LLM

Loading a Model and Configuring Parameters

Navigate to My Models

Open LM Studio and go to the My Models section from the left menu bar.

Load the Model

Click the Settings icon on the model and click Load Model.

Configure Parameters

Go to the Inference tab to access controls for context length, temperature, and system prompt.

Key Configuration Parameters

Context Length

Controls how much information the model can remember during a conversation. Higher values let you work with longer inputs but use more memory.

Temperature

Controls how creative or predictable the model is. Lower values make responses more deterministic, higher values make them more varied.

System Prompt

Sets the behavior of the model. Define how the assistant should respond, including tone, style, and role.

Max Tokens

Limits the maximum length of the model's response. Set this based on how long you expect answers to be.

Chatting with Your Documents (RAG)

One of the most useful features in LM Studio is its built-in RAG support. You can directly upload documents to the chat and start asking questions.

Setting Up Document Q&A

Open a Chat Session

Open a chat session with your loaded model.

Upload Documents

Click the + icon for attaching files. Upload documents like PDFs or text files directly into the chat.

Start Querying

Once the file is added, LM Studio prepares it automatically. Ask questions and the model retrieves relevant sections from your documents.

How RAG Works Under the Hood

Under the hood, LM Studio:

Splits the document into smaller chunks
Converts chunks into embeddings (numerical representations of text)
Retrieves the most relevant chunks when you ask a question
Passes retrieved chunks to the model along with your query

RAG Limitations

The model depends on its context window, so very large documents may not be fully considered at once. Retrieval quality depends on how well the document is chunked. Some answers might miss details if relevant sections are not retrieved correctly.

Running LM Studio as a Local API Server

One of the more powerful features is running LM Studio as a local API server. This lets you use your local LLM inside scripts, apps, or other tools.

Starting the Server

Enable Developer Mode

Click the Settings icon in the bottom-left corner, go to the Developer section, and turn on the Developer Mode toggle.

Access Developer Menu

Go back to the chat interface and click the Developer icon from the left menu bar.

Start the Server

Turn on the toggle next to Status to start the server. Copy the server address once it is running.

Testing with curl

Test the server with a simple curl request:

Terminal

curl http://127.0.0.1:1234/v1/models

If everything is set up correctly, you will see a JSON response listing the available model.

Connecting from Python

Once your local server is running, you can treat it like any other API. The only difference is you are calling your own machine instead of OpenAI servers.

Python

from openai import OpenAI

client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

response = client.chat.completions.create(
    model="local-model",
    messages=[
        {"role": "user", "content": "Explain how local LLMs work"}
    ],
)

print(response.choices[0].message.content)

Understanding the Code

base_url

Tells the code to use your local LM Studio server instead of OpenAI servers.

api_key

Can be any value. LM Studio does not enforce API key authentication.

model

Refers to the model you loaded in LM Studio.

messages

Your prompt structured as a message with role and content.

When you run this code, your request goes to localhost:1234, the model processes it, and you get a response back. This works because LM Studio follows the OpenAI API format.

Frequently Asked Questions

Is LM Studio free?

Yes, LM Studio is free to download and use. There are no subscription fees, usage limits, or paywalled features for the core app.

Is LM Studio completely offline?

Yes, once you download a model, everything runs locally. You only need an internet connection to download models from Hugging Face.

Can I use LM Studio in my own applications?

Yes, LM Studio can run as a local API server that follows the OpenAI API format. You can connect it to scripts, apps, or other tools running on your machine.

Can LM Studio handle images?

Yes, but only with vision-enabled models. If you load a multimodal model like LLaVA or Qwen-VL, you can upload images and ask questions about them.

How much RAM do I need to run LLMs locally?

It depends on the model size. 8GB can handle smaller models, 16GB is sufficient for many use cases, but 32GB or more is required for larger models.

Need Help with Local LLM Setup?

Our AI experts can help you set up LM Studio, optimize model selection for your hardware, and integrate local LLMs into your applications.

Quick answer

Related guides

What You'll Learn:

Install and set up LM Studio on your computer
Download and run LLMs locally with privacy control
Chat with documents using built-in RAG support
Configure model parameters for optimal performance
Set up a local API server for application integration

What is LM Studio?

LM Studio vs Ollama

Feature	LM Studio	Ollama
Interface	GUI-first, user-friendly	CLI-first, terminal-based
Built-in RAG	Yes, no extra setup	Requires external tools
MCP support	Built-in	Limited / not native
Model downloads	Hugging Face in-app	Using ollama pull commands
Ease of setup	Very beginner-friendly	Slight learning curve

System Requirements and Model Choice

RAM Requirements by Model Size

RAM	What You Can Run
8GB	Small models (1B-4B parameters)
16GB	Mid-sized models (7B-9B parameters)
32GB+	Larger models (13B and above)

Recommended Models by Hardware

RAM/VRAM	Recommended Models
8GB	Qwen 2.5 3B/4B, Phi-3 Mini (3.8B), Gemma 2 2B
16GB	Llama 3 8B, Gemma 2 9B, Mistral 7B, Qwen 2.5 7B
24GB	Llama 3.1 8B (higher quality), Mixtral 8x7B (quantized), Qwen 2.5 14B
32GB+	Llama 3.1 70B (heavily quantized), Qwen 2.5 32B, Mixtral variants

Understanding Quantization

You will notice different versions of the same model with labels like Q4_K_M or Q8_0. This refers to quantization levels, which tell how the model is compressed.

Q4 (lower quantization): Reduces memory usage and runs faster, but you lose some quality
Q8 (higher quantization): Better output quality, but requires more RAM and runs slower

If you are unsure, Q4 or Q5 is usually a safe place to start, especially on a 16GB setup.

GGUF Format

Installing LM Studio

You might be asked to allow permissions depending on your system settings. On Mac, it becomes available as an application right after opening the downloaded installer.

Downloading Your First Model

When you open LM Studio for the first time, you land on a clean interface with the model browser. You can search for models, explore options, and begin downloading immediately.

Browsing the Discover Tab

Open LM Studio

Open LM Studio and click the search icon from the left sidebar to access the model browser.

Search for Models

Search for specific models, filter by size, and explore different options. Each model comes with a model card showing size, capabilities, and recommended use cases.

Download a Model

Download a model like Qwen 2.5 7B (Q4_K_M) for a 16GB system. It strikes a good balance between performance and quality.

Chatting with a Local LLM

Loading a Model and Configuring Parameters

Navigate to My Models

Open LM Studio and go to the My Models section from the left menu bar.

Load the Model

Click the Settings icon on the model and click Load Model.

Configure Parameters

Go to the Inference tab to access controls for context length, temperature, and system prompt.

Key Configuration Parameters

Context Length

Controls how much information the model can remember during a conversation. Higher values let you work with longer inputs but use more memory.

Temperature

Controls how creative or predictable the model is. Lower values make responses more deterministic, higher values make them more varied.

System Prompt

Sets the behavior of the model. Define how the assistant should respond, including tone, style, and role.

Max Tokens

Limits the maximum length of the model's response. Set this based on how long you expect answers to be.

Chatting with Your Documents (RAG)

One of the most useful features in LM Studio is its built-in RAG support. You can directly upload documents to the chat and start asking questions.

Setting Up Document Q&A

Open a Chat Session

Open a chat session with your loaded model.

Upload Documents

Click the + icon for attaching files. Upload documents like PDFs or text files directly into the chat.

Start Querying

Once the file is added, LM Studio prepares it automatically. Ask questions and the model retrieves relevant sections from your documents.

How RAG Works Under the Hood

Under the hood, LM Studio:

Splits the document into smaller chunks
Converts chunks into embeddings (numerical representations of text)
Retrieves the most relevant chunks when you ask a question
Passes retrieved chunks to the model along with your query

RAG Limitations

Running LM Studio as a Local API Server

One of the more powerful features is running LM Studio as a local API server. This lets you use your local LLM inside scripts, apps, or other tools.

Starting the Server

Enable Developer Mode

Click the Settings icon in the bottom-left corner, go to the Developer section, and turn on the Developer Mode toggle.

Access Developer Menu

Go back to the chat interface and click the Developer icon from the left menu bar.

Start the Server

Turn on the toggle next to Status to start the server. Copy the server address once it is running.

Testing with curl

Test the server with a simple curl request:

Terminal

curl http://127.0.0.1:1234/v1/models

If everything is set up correctly, you will see a JSON response listing the available model.

Connecting from Python

Once your local server is running, you can treat it like any other API. The only difference is you are calling your own machine instead of OpenAI servers.

Python

from openai import OpenAI

client = OpenAI(base_url="http://localhost:1234/v1", api_key="lm-studio")

response = client.chat.completions.create(
    model="local-model",
    messages=[
        {"role": "user", "content": "Explain how local LLMs work"}
    ],
)

print(response.choices[0].message.content)

Understanding the Code

base_url

Tells the code to use your local LM Studio server instead of OpenAI servers.

api_key

Can be any value. LM Studio does not enforce API key authentication.

model

Refers to the model you loaded in LM Studio.

messages

Your prompt structured as a message with role and content.

When you run this code, your request goes to localhost:1234, the model processes it, and you get a response back. This works because LM Studio follows the OpenAI API format.

Frequently Asked Questions

Is LM Studio free?

Yes, LM Studio is free to download and use. There are no subscription fees, usage limits, or paywalled features for the core app.

Is LM Studio completely offline?

Yes, once you download a model, everything runs locally. You only need an internet connection to download models from Hugging Face.

Can I use LM Studio in my own applications?

Yes, LM Studio can run as a local API server that follows the OpenAI API format. You can connect it to scripts, apps, or other tools running on your machine.

Can LM Studio handle images?

Yes, but only with vision-enabled models. If you load a multimodal model like LLaVA or Qwen-VL, you can upload images and ask questions about them.

How much RAM do I need to run LLMs locally?

It depends on the model size. 8GB can handle smaller models, 16GB is sufficient for many use cases, but 32GB or more is required for larger models.

Need Help with Local LLM Setup?

Our AI experts can help you set up LM Studio, optimize model selection for your hardware, and integrate local LLMs into your applications.

How to Use LM Studio for Local LLMs: Complete Guide

What is LM Studio?

LM Studio vs Ollama

System Requirements and Model Choice

RAM Requirements by Model Size

Recommended Models by Hardware

Understanding Quantization

Installing LM Studio

Downloading Your First Model

Browsing the Discover Tab

Open LM Studio

Search for Models

Download a Model

Chatting with a Local LLM

Loading a Model and Configuring Parameters

Navigate to My Models

Load the Model

Configure Parameters

Key Configuration Parameters

Context Length

Temperature

System Prompt

Max Tokens

Chatting with Your Documents (RAG)

Setting Up Document Q&A

Open a Chat Session

Upload Documents

Start Querying

How RAG Works Under the Hood

Running LM Studio as a Local API Server

Starting the Server

Enable Developer Mode

Access Developer Menu

Start the Server

Testing with curl

Connecting from Python

Understanding the Code

base_url

api_key

model

messages

Frequently Asked Questions

Is LM Studio free?

Is LM Studio completely offline?

Can I use LM Studio in my own applications?

Can LM Studio handle images?

How much RAM do I need to run LLMs locally?

Need Help with Local LLM Setup?

Need this implemented in your project?

Take the guide with you

Book a 30-min architecture call

Get a free 48-hour written brief

How to Use LM Studio for Local LLMs: Complete Guide

What is LM Studio?

LM Studio vs Ollama

System Requirements and Model Choice

RAM Requirements by Model Size

Recommended Models by Hardware

Understanding Quantization

Installing LM Studio

Downloading Your First Model

Browsing the Discover Tab

Open LM Studio

Search for Models

Download a Model

Chatting with a Local LLM

Loading a Model and Configuring Parameters

Navigate to My Models

Load the Model

Configure Parameters

Key Configuration Parameters

Context Length

Temperature

System Prompt

Max Tokens

Chatting with Your Documents (RAG)

Setting Up Document Q&A

Open a Chat Session

Upload Documents

Start Querying